Re: [zfs-discuss] Re: Heavy writes freezing system

2007-01-18 Thread Roch - PAE

Jason J. W. Williams writes:
  Hi Anantha,
  
  I was curious why segregating at the FS level would provide adequate
  I/O isolation? Since all FS are on the same pool, I assumed flogging a
  FS would flog the pool and negatively affect all the other FS on that
  pool?
  
  Best Regards,
  Jason
  

Good point, If the problem is

6413510 zfs: writing to ZFS filesystem slows down fsync() on other files

Then the seggegration to 2 filesystem on the same pool will
help.

But if the problem is more like

6429205 each zpool needs to monitor its throughput and throttle heavy 
writers

then it 2 FS won't help. 2 pools probably would though.

-r


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Heavy writes freezing system

2007-01-18 Thread Rainer Heilke
 Bag-o-tricks-r-us, I suggest the following in such a case:
 
 - Two ZFS pools
   - One for production
 - One for Education

The DBA's are very resistant to splitting our whole environments. There are 
nine on the test/devl server! So, we're going to put the DB files and redo logs 
on separate (UFS with directio) LUN's. Binaries and backups will go onto two 
separate ZFS LUN's. With production, they can do their cloning at night to 
minimize impact. Not sure what they'll do on test/devl. The two ZFS file 
systems will probably also be separate zpools (political as well as juggling 
Hitachi disk space reasons).

BTW, it wasn't the storage guys who decided the one filesystem to rule them 
all strategy, but my predecessors. It was part of the move from Clarion arrays 
to Hitachi. The storage folks know about, understand, and agree with us when we 
talk about these kinds of issues (at least, they do now). We've pushed the 
caching and other subsystems often enough to make this painfully clear.

 Another thought is while ZFS works out its kinks why
 not use the BCV or ShadowCopy or whatever IBM calls
 it to create Education instance. This will reduce a
 tremendous amount of I/O.

This means buying more software to alleviate a short-term problem (with RAC, 
the whole design will be different, including moving to ASM). We have RMAN and 
OEM already, so this argument won't fly.

 BTW, I'm curious what application using Oracle is
 creating more than a million files?

Oracle Financials. The application includes everything but the kitchen sink 
(but the bathroom sink is there!).

Thanks for all of your feedback and suggestions. They all sound bang on. If we 
could just get all the pieces in place to move forward now, I think we'll be 
OK. One big issue for us will be finding the Hitachi disk space--we're pretty 
full-up right now. :-(

Rainer
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Heavy writes freezing system

2007-01-17 Thread Robert Milkowski
Hello Anantha,

Wednesday, January 17, 2007, 2:35:01 PM, you wrote:

ANS You're probably hitting the same wall/bug that I came across;
ANS ZFS in all versions up to and including Sol10U3 generates
ANS excessive I/O when it encounters 'fssync' or if any of the files
ANS were opened with 'O_DSYNC' option.

ANS I do believe Oracle (or any DB for that matter) opens the file
ANS with O_DSYNC option. During normal times it does result in
ANS excessive I/O but is probably well under your system capacity (it
ANS was in our case.) But when you are doing backups or clones
ANS (Oracle clones by using RMAN or copying of db files?) you are
ANS going to flood the I/O sub-system and that's when the whole ZFS
ANS excessive I/O starts to put a hurt on the DB performance.

ANS Here are a few suggestions that can give you interim relief:

ANS - Seggregate your I/O at filesystem level; the bug is at the
ANS filesystem level not ZFS pool level. By this I mean ensure the
ANS online redo logs are in a ZFS FS that nobody else uses, same for
ANS control files. As long as the writes to control and online redo
ANS logs are met your system will be happy.
ANS - Ensure that your clone and RMAN (if you're going to disk)
ANS write to a seperate ZFS FS that contains no production files.
ANS - If the above two items don't give you relieve then relocate
ANS the online redo log and control files to a UFS filesystem. No
ANS need to downgrade the entire ZFS to something else.
ANS - Consider Oracle ASM (DB version permitting,) works very well. Why deal 
with VxFS.

ANS Feel free to drop me a line, I've over 17 years of Oracle DB
ANS experience and love to troubleshoot problems like this. I've
ANS another vested interest; we're considering ZFS for widespread use
ANS in our environment and any experience is good for us.
ANS  

Also as an workaround you could disable zil if it's acceptable to you
(in case of system panic or hard reset you can endup with
unrecoverable database).


-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Heavy writes freezing system

2007-01-17 Thread Rainer Heilke
 What do you mean by UFS wasn't an option due to
 number of files?

Exactly that. UFS has a 1 million file limit under Solaris. Each Oracle 
Financials environment well exceeds this limitation.

 Also do you have any tunables in system?
 Can you send 'zpool status' output? (raidz, mirror,
 ...?)

Our tunables are:

set noexec_user_stack=1
set sd:sd_max_throttle = 32
set sd:sd_io_time = 0x3c

zpool status:

  zpool status
  pool: d
 state: ONLINE
 scrub: none requested
config:

NAME STATE READ WRITE CKSUM
dONLINE   0 0 0
  c5t60060E800475AA0075AA100Bd0  ONLINE   0 0 0
  c5t60060E800475AA0075AA100Dd0  ONLINE   0 0 0
  c5t60060E800475AA0075AA100Cd0  ONLINE   0 0 0
  c5t60060E800475AA0075AA100Ed0  ONLINE   0 0 0

errors: No known data errors


 When the DBA?s do clones - you mean that by just
 doing 'zfs clone
 ...' you get big performance problem? OR maybe just
 before when you do
 'zfs snapshot' first? How much free space is left in
 a pool?

Nope. The DBA group clones the production instance using OEM in order to build 
copies for Education, development, etc. This is strictly an Oracle function, 
not a file system (ZFS) operation.

 Do you have sar data when problems occured? Any
 paging in a system?

Some. I'll have to have the other analyst try to pull out the times when our 
testing was done, but I've been told nothing stood out. (I love playing 
middle-man. NOT!)

 And one advise - before any more testing I would
 definitely
 upgrade/reinstall system to U3 when it comes to ZFS.

Not an option. This isn't even a faint possibility. We're talking both our 
test/development servers, and our production/education. That's six servers to 
upgrade (remember, we have a the applications on servers distinct from the 
database servers--the DBA's would never let us divurge the OS releases).

Rainer
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Heavy writes freezing system

2007-01-17 Thread Rainer Heilke
Thanks for the feedback! 

This does sound like what we're hitting. From our testing, you are absolutely 
correct--separating out the parts is a major help. The big problem we still 
see, though, is doing the clones/recoveries. The DBA group clones the 
production environment for Education. Since both of these instances live on the 
same server and ZPool/filesystem, this kills the throughput. When doing cloning 
or backups to a different area, whether UFS or ZFS, we don't have the issues.

I'll know for sure later today or tomorrow, but it sounds like they are 
seriously considering the ASM route. Since we will be going to RAC later this 
year, this move makes the most sense. We'll just have to hope that the DBA 
group gets a better understanding of LUN's and our SAN, as they'll be taking 
over part of the disk (LUN) management. :-/ We were hoping we could get some 
interrim relief on the ZFS front through tuning or something, but if what 
you're saying is correct (and it sounds like it is), we may be out of luck.

Thanks very much for the feedback.

Rainer
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Heavy writes freezing system

2007-01-17 Thread Richard Elling

Rainer Heilke wrote:
I'll know for sure later today or tomorrow, but it sounds like they are 
seriously considering the ASM route. Since we will be going to RAC later 
this year, this move makes the most sense. We'll just have to hope that 
the DBA group gets a better understanding of LUN's and our SAN, as they'll 
be taking over part of the disk (LUN) management. :-/ We were hoping we 
could get some interrim relief on the ZFS front through tuning or something, 
but if what you're saying is correct (and it sounds like it is), we may be 
out of luck.


If you plan on RAC, then ASM makes good sense.  It is unclear (to me anyway)
if ASM over a zvol is better than ASM over a raw LUN.  It would be nice to
have some of the zfs features such as snapshots, without having to go through
extraordinary pain or buy expensive RAID arrays.  If someone has tried ASM
on a zvol, please speak up :-)
 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Heavy writes freezing system

2007-01-17 Thread Richard Elling

Rainer Heilke wrote:

What do you mean by UFS wasn't an option due to
number of files?


Exactly that. UFS has a 1 million file limit under Solaris. Each Oracle 
Financials environment well exceeds this limitation.


Really?!?  I thought Oracle would use a database for storage...


Also do you have any tunables in system?
Can you send 'zpool status' output? (raidz, mirror,
...?)


Our tunables are:

set noexec_user_stack=1
set sd:sd_max_throttle = 32
set sd:sd_io_time = 0x3c


EMC?


zpool status:

  zpool status
  pool: d
 state: ONLINE
 scrub: none requested
config:

NAME STATE READ WRITE CKSUM
dONLINE   0 0 0
  c5t60060E800475AA0075AA100Bd0  ONLINE   0 0 0
  c5t60060E800475AA0075AA100Dd0  ONLINE   0 0 0
  c5t60060E800475AA0075AA100Cd0  ONLINE   0 0 0
  c5t60060E800475AA0075AA100Ed0  ONLINE   0 0 0

errors: No known data errors



When the DBA?s do clones - you mean that by just
doing 'zfs clone
...' you get big performance problem? OR maybe just
before when you do
'zfs snapshot' first? How much free space is left in
a pool?


Nope. The DBA group clones the production instance using OEM in order to build 
copies for Education, development, etc. This is strictly an Oracle function, 
not a file system (ZFS) operation.


Do you have sar data when problems occured? Any
paging in a system?


Some. I'll have to have the other analyst try to pull out the times when our 
testing was done, but I've been told nothing stood out. (I love playing 
middle-man. NOT!)


And one advise - before any more testing I would
definitely
upgrade/reinstall system to U3 when it comes to ZFS.


Not an option. This isn't even a faint possibility. We're talking both our 
test/development servers, and our production/education. That's six servers to 
upgrade (remember, we have a the applications on servers distinct from the 
database servers--the DBA's would never let us divurge the OS releases).


Yes this is common, so you should look for the patches which should
fix at least the fsync problem.  Check the archives here for patch
update info from George Wilson.
 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Heavy writes freezing system

2007-01-17 Thread Dennis Clarke

 What do you mean by UFS wasn't an option due to
 number of files?

 Exactly that. UFS has a 1 million file limit under Solaris. Each Oracle
 Financials environment well exceeds this limitation.


what ?

$ uname -a
SunOS core 5.10 Generic_118833-17 sun4u sparc SUNW,UltraSPARC-IIi-cEngine
$ df -F ufs -t
/  (/dev/md/dsk/d0):  5367776 blocks   616328 files
  total: 13145340 blocks   792064 files
/export/nfs(/dev/md/dsk/d8): 83981368 blocks 96621651 files
  total: 404209452 blocks 100534720 files
/export/home   (/dev/md/dsk/d7):   980894 blocks   260691 files
  total:   986496 blocks   260736 files
$

I think that I am 95,621,651 files over your 1 million limit right there!

Should I place a support call and file a bug report ?

Dennis

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Heavy writes freezing system

2007-01-17 Thread Michael Schuster

Dennis Clarke wrote:

What do you mean by UFS wasn't an option due to
number of files?

Exactly that. UFS has a 1 million file limit under Solaris. Each Oracle
Financials environment well exceeds this limitation.



what ?

$ uname -a
SunOS core 5.10 Generic_118833-17 sun4u sparc SUNW,UltraSPARC-IIi-cEngine
$ df -F ufs -t
/  (/dev/md/dsk/d0):  5367776 blocks   616328 files
  total: 13145340 blocks   792064 files
/export/nfs(/dev/md/dsk/d8): 83981368 blocks 96621651 files
  total: 404209452 blocks 100534720 files
/export/home   (/dev/md/dsk/d7):   980894 blocks   260691 files
  total:   986496 blocks   260736 files
$

I think that I am 95,621,651 files over your 1 million limit right there!


is that a multi-terabyte-UFS? if no, ignore :-), it yes, the actual limit is 1 million 
inode PER Terabyte.


HTH
--
Michael Schuster
Sun Microsystems, Inc.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Heavy writes freezing system

2007-01-17 Thread Jason J. W. Williams

Hi Anantha,

I was curious why segregating at the FS level would provide adequate
I/O isolation? Since all FS are on the same pool, I assumed flogging a
FS would flog the pool and negatively affect all the other FS on that
pool?

Best Regards,
Jason

On 1/17/07, Anantha N. Srirama [EMAIL PROTECTED] wrote:

You're probably hitting the same wall/bug that I came across; ZFS in all 
versions up to and including Sol10U3 generates excessive I/O when it encounters 
'fssync' or if any of the files were opened with 'O_DSYNC' option.

I do believe Oracle (or any DB for that matter) opens the file with O_DSYNC 
option. During normal times it does result in excessive I/O but is probably 
well under your system capacity (it was in our case.) But when you are doing 
backups or clones (Oracle clones by using RMAN or copying of db files?) you are 
going to flood the I/O sub-system and that's when the whole ZFS excessive I/O 
starts to put a hurt on the DB performance.

Here are a few suggestions that can give you interim relief:

- Seggregate your I/O at filesystem level; the bug is at the filesystem level 
not ZFS pool level. By this I mean ensure the online redo logs are in a ZFS FS 
that nobody else uses, same for control files. As long as the writes to control 
and online redo logs are met your system will be happy.
- Ensure that your clone and RMAN (if you're going to disk) write to a seperate 
ZFS FS that contains no production files.
- If the above two items don't give you relieve then relocate the online redo 
log and control files to a UFS filesystem. No need to downgrade the entire ZFS 
to something else.
- Consider Oracle ASM (DB version permitting,) works very well. Why deal with 
VxFS.

Feel free to drop me a line, I've over 17 years of Oracle DB experience and 
love to troubleshoot problems like this. I've another vested interest; we're 
considering ZFS for widespread use in our environment and any experience is 
good for us.


This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Heavy writes freezing system

2007-01-17 Thread Anantha N. Srirama
Bag-o-tricks-r-us, I suggest the following in such a case:

- Two ZFS pools
  - One for production
  - One for Education
  - Isolate the LUNs feeding the pools if possible, don't share spindles. 
Remember on EMC/Hitachi you've logical LUNs created by striping/concat'ng 
carved up physical disks, so you could have two LUNs that share the same 
spindle. Don't believe one word from your storage admin about we've lot of 
cache to abstract the physical structure; Oracle can push any storage 
sub-system over the edge. Almost all of the storage vendors prevent one LUN 
from flooding the cache with writes, EMC gives no more than 8x the initial 
allocation of cache (total cache/total disk space) and after that it'll stall 
your writes until destage is complete.

- At least two ZFS filesystems under Production pool
  - One for online redo logs and control files. If need be you can further 
seggregate them onto two seperate ZFS filesystems.
  - One for db files. If need be you can isolate further by data, index, temp, 
archived redo, ...
  - Don't host the 'temp' on ZFS, just feed it plain old UFS or raw disk.
  - Match up your ZFS recordsize with your DB blocksize * multi block read 
count. Don't do this for the index filesystem, just the filesystem hosting data

Rinse and repeat for your Education ZFS pool. This will give you substantial 
isolation and improvement, sufficient enough to buy you time to plan out a 
better deployment strategy given that you're under the gun now.

Another thought is while ZFS works out its kinks why not use the BCV or 
ShadowCopy or whatever IBM calls it to create Education instance. This will 
reduce a tremendous amount of I/O.

Just this past weekend I re-did our SAS server to relocate [b]just[/b] the SAS 
work area to good ol' UFS and the payback is tremendous; not one complaint 
about performance 3 days in a row (we used to hear daily complaints.) By taking 
care of your online redo logs and control files (maybe skipping ZFS for it all 
together and running it on UFS) you'll breathe easier.

BTW, I'm curious what application using Oracle is creating more than a million 
files?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re[2]: [zfs-discuss] Re: Heavy writes freezing system

2007-01-17 Thread Robert Milkowski
Hello Jason,

Wednesday, January 17, 2007, 11:24:50 PM, you wrote:

JJWW Hi Anantha,

JJWW I was curious why segregating at the FS level would provide adequate
JJWW I/O isolation? Since all FS are on the same pool, I assumed flogging a
JJWW FS would flog the pool and negatively affect all the other FS on that
JJWW pool?

because of the bug which forces all outstanding writes in a file
system to commit to storage in case of one fsync to one file.
Now when you separate data to different file systems the bug will
affect only data in that file system which could greatly reduce imapct
on performance if it's done right.

-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: Re[2]: [zfs-discuss] Re: Heavy writes freezing system

2007-01-17 Thread Jason J. W. Williams

Hi Robert,

I see. So it really doesn't get around the idea of putting DB files
and logs on separate spindles?

Best Regards,
Jason

On 1/17/07, Robert Milkowski [EMAIL PROTECTED] wrote:

Hello Jason,

Wednesday, January 17, 2007, 11:24:50 PM, you wrote:

JJWW Hi Anantha,

JJWW I was curious why segregating at the FS level would provide adequate
JJWW I/O isolation? Since all FS are on the same pool, I assumed flogging a
JJWW FS would flog the pool and negatively affect all the other FS on that
JJWW pool?

because of the bug which forces all outstanding writes in a file
system to commit to storage in case of one fsync to one file.
Now when you separate data to different file systems the bug will
affect only data in that file system which could greatly reduce imapct
on performance if it's done right.

--
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Heavy writes freezing system

2007-01-16 Thread Rainer Heilke
 What hardware is used?  Sparc? x86 32-bit? x86
 64-bit?
 How much RAM is installed?
 Which version of the OS? 

Sorry, this is happening on two systems (test and production). They're both 
Solaris 10, Update 2. Test is a V880 with 8 CPU's and 32GB, production is an 
E2900 with 12 dual-core CPU's and 48GB.

 Did you already try to monitor kernel memory usage,
 while writing to zfs?  Maybe the kernel is running
 out of
 free memory?  (I've bugs like 6483887 in mind, 
 without direct management, arc ghost lists can run
 amok)

We haven't seen serious kernel memory usage that I know of (I'll be honest--I 
came into this problem late).

 For a live system:
 
 echo ::kmastat | mdb -k
 echo ::memstat | mdb -k

I can try this if the DBA group is willing to do another test, thanks.

 In case you've got a crash dump for the hung system,
 you
 can try the same ::kmastat and ::memstat commands
 using the 
 kernel crash dumps saved in directory
 /var/crash/`hostname`
 
 # cd /var/crash/`hostname`
 # mdb -k unix.1 vmcore.1
 ::memstat
 ::kmastat

The system doesn't actually crash. It also doesn't freeze _completely_. While I 
call it a freeze (best name for it), it actually just slows down incredibly. 
It's like the whole system bogs down like molasses in January. Things happen, 
but very slowly.

Rainer
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss