[zfs-discuss] zfs performance issue

2010-05-10 Thread Abhishek Gupta

Hi,

I just installed OpenSolaris on my Dell Optiplex 755 and created raidz2 
with a few slices on a single disk. I was expecting a good read/write 
performance but I got the speed of 12-15MBps.

How can I enhance the read/write performance of my raid?
Thanks,
Abhi.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs performance issue

2010-05-10 Thread Erik Trimble

Abhishek Gupta wrote:

Hi,

I just installed OpenSolaris on my Dell Optiplex 755 and created 
raidz2 with a few slices on a single disk. I was expecting a good 
read/write performance but I got the speed of 12-15MBps.

How can I enhance the read/write performance of my raid?
Thanks,
Abhi.


You absolutely DON'T want to do what you've done.  Creating a ZFS pool 
(or, for that matter, any RAID device,whether hardware or software) out 
of slices/partitions of a single disk is a recipe for horrible performance.


In essence, you reduce your performance to 1/N (or worse) of the whole 
disk, where N is the number of slices you created.


So, create your zpool using disks or partitions from different disks.  
It's OK to have more than one partition on a disk - just use them in 
different pools for reasonable performance.


--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance Issue

2008-02-13 Thread William Fretts-Saxton
After working with Sanjeev, and putting in a bunch of timing statement 
throughout the code, it turns out that file writes ARE NOT the bottleneck, as 
would be assumed.

It is actually reading the file into a byte buffer that is the culprit.  
Specifically, this java command:

byteBuffer = file.getChannel().map(mapMode, 0, length);

I'm going to try to apply the some of the same things I tried here with 
troubleshooting the writes to the reads now.  If anyone has any different 
advice, please let me know.

Thanks for all the help so far.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance Issue

2008-02-11 Thread William Fretts-Saxton
It does.  The file size is limited to the original creation size, which is 65k 
for files with 1 data sample.

Unfortunately, I have zero experience with dtrace and only a little with truss. 
 I'm relying on the dtrace scripts from people on this thread to get by for now!
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance Issue

2008-02-11 Thread johansen
 Is deleting the old files/directories in the ZFS file system
 sufficient or do I need to destroy/recreate the pool and/or file
 system itself?  I've been doing the former.

The former should be sufficient, it's not necessary to destroy the pool.

-j

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance Issue

2008-02-11 Thread William Fretts-Saxton
I ran this dtrace script and got no output.  Any ideas?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance Issue

2008-02-10 Thread Robert Milkowski
Hello William,

Thursday, February 7, 2008, 7:46:51 PM, you wrote:

WFS -Setting zfs_nocacheflush, though got me drastically increased
WFS throughput--client requests took, on average, less than 2 seconds each!

That's interesting - a bug in scsi driver for v40z?


-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance Issue

2008-02-10 Thread Johan Hartzenberg
On Feb 5, 2008 9:52 PM, William Fretts-Saxton [EMAIL PROTECTED]
wrote:

 This may not be a ZFS issue, so please bear with me!

 I have 4 internal drives that I have striped/mirrored with ZFS and have an
 application server which is reading/writing to hundreds of thousands of
 files on it, thousands of files @ a time.

 If 1 client uses the app server, the transaction (reading/writing to ~80
 files) takes about 200 ms.  If I have about 80 clients attempting it @ once,
 it can sometimes take a minute or more.  I'm pretty sure its a file I/O
 bottleneck so I want to make sure ZFS is tuned properly for this kind of
 usage.

 The only thing I could think of, so far, is to turn off ZFS compression.
  Is there anything else I can do?  Here is my zpool iostat output:


Hi William

To improve performance, consider turning off atime, assuming you don't need
it...

# zfs set atime=off POOL/filesystem

  _J
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance Issue

2008-02-09 Thread Henk Langeveld
William Fretts-Saxton wrote:
 Unfortunately, I don't know the record size of the writes.  Is it as
 simple as looking @ the size of a file, before and after a client
 request, and noting the difference in size?

and

 The I/O is actually done by RRD4J, [...] a Java version of 'rrdtool'

If it behaves like rrdtool, it will limit the size of the file, by
consolidating older data.  After every n samples, older data will be
replaced by an aggregate, freeing space for new samples.  To me that
implies random I/O.  You really need a tool like dtrace (or old
fashioned truss) to see the sample rate and size.

Cheers,
Henk
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance Issue

2008-02-08 Thread William Fretts-Saxton
We are going to get a 6120 for this temporarily.  If all goes well, we are 
going to move to a 6140 SAN solution.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance Issue

2008-02-08 Thread William Fretts-Saxton
Hi Daniel.  I take it you are an RRD4J user?

I didn't see anything in the performance issues area that would help.  Please 
let me know if I'm missing something:

- The default of RRD4J is to use NIO backend, so that is already in place.

- Pooling won't help because there is almost never a time when an RRD file will 
be accessed simultaneously.

- I'm using trial and error when it comes to the recsize right now, so I'll 
post back with my results.  Right now, it looks like a higher recsize is better 
(16k better performance than 8k, etc) which is strange, but I'm not done yet.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance Issue

2008-02-08 Thread William Fretts-Saxton
 The other thing to keep in mind is that the tunables
 like compression
 and recsize only affect newly written blocks.  If you
 have a bunch of
 data that was already laid down on disk and then you
 change the tunable,
 this will only cause new blocks to have the new size.
  If you experiment
 ith this, make sure all of your data has the same
 blocksize by copying
 it over to the new pool once you've changed the
 properties.

Is deleting the old files/directories in the ZFS file system sufficient or do I 
need to destroy/recreate the pool and/or file system itself?  I've been doing 
the former.

I will use your dtrace script today and get back to you.  Thanks for that.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance Issue

2008-02-07 Thread William Fretts-Saxton
I just installed nv82 so we'll see how that goes.  I'm going to try the 
recordsize idea above as well.

A note about UFS:  I was told by our local Admin guru that ZFS turns on 
write-caching for disks, which is something that a UFS file system should not 
have turned on, so that if I convert the ZFS f/s to a UFS one, I could be 
giving the UFS performance an unrealistic boost to performance because it 
would still have the caching on.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance Issue

2008-02-07 Thread William Fretts-Saxton
Unfortunately, I don't know the record size of the writes.  Is it as simple as 
looking @ the size of a file, before and after a client request, and noting the 
difference in size?  This is binary data, so I don't know if that makes a 
difference, but the average write size is a lot smaller than the file size.  

Should the recordsize be in place BEFORE data is written to the file system, or 
can it be changed after the fact?  I might try a bunch of different settings 
for trial and error.

The I/O is actually done by RRD4J, which is a round-robin database library.  It 
is a Java version of 'rrdtool' which saves data into a binary format, but also 
cleans up the data according to its age, saving less of the older data as 
time goes on.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance Issue

2008-02-07 Thread William Fretts-Saxton
To avoid making multiple posts, I'll just write everything here:

-Moving to nv_82 did not seem to do anything, so I doesn't look like fsync was 
the issue.
-Disabling ZIL didn't do anything either
-Still playing with 'recsize' values but it doesn't seem to be doing much...I 
don't think I have a good understand of what exactly is being written...I think 
the whole file might be overwritten each time because it's in binary format.
-Setting zfs_nocacheflush, though got me drastically increased 
throughput--client requests took, on average, less than 2 seconds each!

So, in order to use this, I should have a storage array, w/battery backup, 
instead of using the internal drives, correct?  I have the option of using a 
6120 or 6140 array on this system so I might just try that out.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance Issue

2008-02-07 Thread William Fretts-Saxton
Slight correction.  'recsize' must be a power of 2 so it would be 8192.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance Issue

2008-02-07 Thread William Fretts-Saxton
One thing I just observed is that the initial file size is 65796 bytes.  When 
it gets an update, the file size remains @ 65796.

Is there a minimum file size?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance Issue

2008-02-07 Thread William Fretts-Saxton
RRD4J isn't a DB, per se, so it doesn't really have a record size.  In fact, 
I don't even know if, when data is written to the binary, whether it is 
contiguous or not so the amount written may not directly correlate to a proper 
record-size.

I did run your command and found the size patterns you were talking about:

  462  java409
 3320  java409
 6819  java409
5  java   1227
1  java   1692
   16  java   3243

409 is the number of clients I tested, so I assume it means the largest write 
it makes is 6819.  Is that bits or bytes?

Does that mean I should try setting my recordsize equal to the lowest multiple 
of 512 GREATER than 6819? (14 x 512 = 7168)
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance Issue

2008-02-07 Thread Sanjeev Bagewadi
William,

It should be fairly easy to find the record size using DTrace. Take an 
aggregation of the
the writes happening (aggregate on size for all the write(2) system calls).

This would give fair idea of the IO size pattern.

Does RRD4J have a record size mentioned ? Usually if it is a 
database-application they have a record-size
option when the DB is created (based on my limited knowledge about DBs).

Thanks and regards,
Sanjeev.

PS : Here is a simple script which just aggregates on the write size and 
executable name :
-- snip --
#!/usr/sbin/dtrace -s


syscall::write:entry
{
wsize = (size_t) arg2;
@write[wsize, execname] = count();
}
-- snip --

William Fretts-Saxton wrote:
 Unfortunately, I don't know the record size of the writes.  Is it as simple 
 as looking @ the size of a file, before and after a client request, and 
 noting the difference in size?  This is binary data, so I don't know if that 
 makes a difference, but the average write size is a lot smaller than the file 
 size.  

 Should the recordsize be in place BEFORE data is written to the file system, 
 or can it be changed after the fact?  I might try a bunch of different 
 settings for trial and error.

 The I/O is actually done by RRD4J, which is a round-robin database library.  
 It is a Java version of 'rrdtool' which saves data into a binary format, but 
 also cleans up the data according to its age, saving less of the older data 
 as time goes on.
  
  
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   


-- 
Solaris Revenue Products Engineering,
India Engineering Center,
Sun Microsystems India Pvt Ltd.
Tel:x27521 +91 80 669 27521 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance Issue

2008-02-07 Thread johansen
 -Still playing with 'recsize' values but it doesn't seem to be doing
 much...I don't think I have a good understand of what exactly is being
 written...I think the whole file might be overwritten each time
 because it's in binary format.

The other thing to keep in mind is that the tunables like compression
and recsize only affect newly written blocks.  If you have a bunch of
data that was already laid down on disk and then you change the tunable,
this will only cause new blocks to have the new size.  If you experiment
with this, make sure all of your data has the same blocksize by copying
it over to the new pool once you've changed the properties.

 -Setting zfs_nocacheflush, though got me drastically increased
 throughput--client requests took, on average, less than 2 seconds
 each!
 
 So, in order to use this, I should have a storage array, w/battery
 backup, instead of using the internal drives, correct?

zfs_nocacheflush should only be used on arrays with a battery backed
cache.  If you use this option on a disk, and you lose power, there's no
guarantee that your write successfully made it out of the cache.

A performance problem when flushing the cache of an individual disk
implies that there's something wrong with the disk or its firmware.  You
can disable the write cache of an individual disk using format(1M).  When you
do this, ZFS won't lose any data, whereas enabling zfs_nocacheflush can
lead to problems.

I'm attaching a DTrace script that will show the cache-flush times
per-vdev.  Remove the zfs_nocacheflush tuneable and re-run your test
while using this DTrace script.  If one particular disk takes longer
than the rest to flush, this should show us.  In that case, we can
disable the write cache on that particular disk.  Otherwise, we'll need
to disable the write cache on all of the disks.

The script is attached as zfs_flushtime.d

Use format(1M) with the -e option to adjust the write_cache settings for
SCSI disks.

-j
#!/usr/sbin/dtrace -Cs
/*
 * CDDL HEADER START
 *
 * The contents of this file are subject to the terms of the
 * Common Development and Distribution License (the License).
 * You may not use this file except in compliance with the License.
 *
 * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
 * or http://www.opensolaris.org/os/licensing.
 * See the License for the specific language governing permissions
 * and limitations under the License.
 *
 * When distributing Covered Code, include this CDDL HEADER in each
 * file and include the License file at usr/src/OPENSOLARIS.LICENSE.
 * If applicable, add the following below this CDDL HEADER, with the
 * fields enclosed by brackets [] replaced with your own identifying
 * information: Portions Copyright [] [name of copyright owner]
 *
 * CDDL HEADER END
 */

/*
 * Copyright 2008 Sun Microsystems, Inc.  All rights reserved.
 * Use is subject to license terms.
 */

#define DKIOC   (0x04  8)
#define DKIOCFLUSHWRITECACHE(DKIOC|34)

fbt:zfs:vdev_disk_io_start:entry
/(args[0]-io_cmd == DKIOCFLUSHWRITECACHE)  (self-traced == 0)/
{
self-traced = args[0];
self-start = timestamp;
}

fbt:zfs:vdev_disk_ioctl_done:entry
/args[0] == self-traced/
{
@a[stringof(self-traced-io_vd-vdev_path)] =
quantize(timestamp - self-start);
self-start = 0;
self-traced = 0;
}

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance Issue

2008-02-07 Thread Vincent Fox
 -Setting zfs_nocacheflush, though got me drastically
 increased throughput--client requests took, on
 average, less than 2 seconds each!
 
 So, in order to use this, I should have a storage
 array, w/battery backup, instead of using the
 internal drives, correct?  I have the option of using
 a 6120 or 6140 array on this system so I might just
 try that out.

We use 3510 and 2540 arrays for Cyrus mail-stores which hold about 10K accounts 
each.  Recommend going with dual-controllers though for safety.  Our setups are 
really simple.  Put 2 array units on the SAN, make a pair or RAID-5 LUNs.  Then 
RAID-10 these LUNs together in ZFS.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance Issue

2008-02-07 Thread Daniel Cheng
William Fretts-Saxton wrote:
 Unfortunately, I don't know the record size of the writes.  Is it as simple 
 as looking @ the size of a file, before and after a client request, and 
 noting the difference in size?  This is binary data, so I don't know if that 
 makes a difference, but the average write size is a lot smaller than the file 
 size.  
 
 Should the recordsize be in place BEFORE data is written to the file system, 
 or can it be changed after the fact?  I might try a bunch of different 
 settings for trial and error.
 
 The I/O is actually done by RRD4J, which is a round-robin database library.  
 It is a Java version of 'rrdtool' which saves data into a binary format, but 
 also cleans up the data according to its age, saving less of the older data 
 as time goes on.
  

You should tune that in application level, see
https://rrd4j.dev.java.net/ down in performance issue section.

Try the NIO backend and use smaller (2048?)  record size...

-- 
This space was intended to be left blank.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance Issue

2008-02-06 Thread William Fretts-Saxton
I disabled file prefetch and there was no effect.

Here are some performance numbers.  Note that, when the application server used 
a ZFS file system to save its data, the transaction took TWICE as long.  For 
some reason, though, iostat is showing 5x as much disk writing (to the physical 
disks) on the ZFS partition.  Can anyone see a problem here?

-
Average application server client response time (1st run/2nd run):

SVM - 12/18 seconds
ZFS - 35/38 seconds

SVM Performance
---
# iostat -xnz 5
extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  195.1  414.3 1465.9 1657.3  0.0  1.70.02.7   0  98 md/d100
   97.5  414.3  730.2 1657.3  0.0  1.00.01.9   0  74 md/d101
   97.7  414.1  735.8 1656.5  0.0  0.80.01.5   0  59 md/d102
   54.4  203.6  370.7  814.2  0.0  0.50.02.1   0  42 c0t2d0
   52.8  210.6  359.5  842.2  0.0  0.50.01.9   0  40 c0t3d0
   54.0  203.6  374.7  814.2  0.0  0.30.01.2   0  26 c0t4d0
   52.2  210.6  361.1  842.2  0.0  0.50.01.8   0  38 c0t5d0

ZFS Performance
---
# iostat -xnz 5
extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
   23.2  148.8 1496.7 3806.8  0.0  2.50.0   14.7   0  21 c0t2d0
   22.8  148.8 1470.9 3806.8  0.0  2.40.0   13.9   0  22 c0t3d0
   24.2  149.0 1561.1 3805.0  0.0  1.50.08.6   0  18 c0t4d0
   23.4  149.4 1509.6 3805.0  0.0  2.50.0   14.7   0  25 c0t5d0

# zpool iostat 5
   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
pool1   5.69G   266G 12243   775K  7.20M
pool1   5.69G   266G 88232  5.53M  7.12M
pool1   5.69G   266G 78216  4.87M  6.81M
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance Issue

2008-02-06 Thread Will Murnane
On Feb 6, 2008 6:36 PM, William Fretts-Saxton
[EMAIL PROTECTED] wrote:
 Here are some performance numbers.  Note that, when the
 application server used a ZFS file system to save its data, the
 transaction took TWICE as long.  For some reason, though, iostat is
 showing 5x as much disk writing (to the physical disks) on the ZFS
 partition.  Can anyone see a problem here?
What is the disk layout of the zpool in question?  Striped?  Mirrored?
 Raidz?  I would suggest either a simple stripe or striping+mirroring
as the best-performing layout.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance Issue

2008-02-06 Thread William Fretts-Saxton
It is a striped/mirror:

 # zpool status
NAMESTATE READ WRITE CKSUM
pool1   ONLINE   0 0 0
  mirrorONLINE   0 0 0
c0t2d0  ONLINE   0 0 0
c0t3d0  ONLINE   0 0 0
  mirrorONLINE   0 0 0
c0t4d0  ONLINE   0 0 0
c0t5d0  ONLINE   0 0 0
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance Issue

2008-02-06 Thread Vincent Fox
Solaris 10u4 eh?

Sounds a lot like fsync issues we want into, trying to run Cyrus mail-server 
spools in ZFS.

This was highlighted for us by the filebench software varmail test.

OpenSolaris nv78 however worked very well.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance Issue

2008-02-06 Thread Marc Bevand
William Fretts-Saxton william.fretts.saxton at sun.com writes:
 
 I disabled file prefetch and there was no effect.
 
 Here are some performance numbers.  Note that, when the application server
 used a ZFS file system to save its data, the transaction took TWICE as long.
 For some reason, though, iostat is showing 5x as much disk
 writing (to the physical disks) on the ZFS partition.  Can anyone see a
 problem here?

Possible explanation: the Glassfish applications are using synchronous
writes, causing the ZIL (ZFS Intent Log) to be intensively used, which
leads to a lot of extra I/O. Try to disable it:

http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Disabling_the_ZIL_.28Don.27t.29

Since disabling it is not recommended, if you find out it is the cause of your
perf problems, you should instead try to use a SLOG (separate intent log, see
above link). Unfortunately your OS version (Solaris 10 8/07) doesn't support
SLOGs, they have only been added to OpenSolaris build snv_68:

http://blogs.sun.com/perrin/entry/slog_blog_or_blogging_on

-marc

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance Issue

2008-02-06 Thread Neil Perrin
Marc Bevand wrote:
 William Fretts-Saxton william.fretts.saxton at sun.com writes:
   
 I disabled file prefetch and there was no effect.

 Here are some performance numbers.  Note that, when the application server
 used a ZFS file system to save its data, the transaction took TWICE as long.
 For some reason, though, iostat is showing 5x as much disk
 writing (to the physical disks) on the ZFS partition.  Can anyone see a
 problem here?
 

 Possible explanation: the Glassfish applications are using synchronous
 writes, causing the ZIL (ZFS Intent Log) to be intensively used, which
 leads to a lot of extra I/O.

The ZIL doesn't do a lot of extra IO. It usually just does one write per 
synchronous request and will batch
up multiple writes into the same log block if possible. However, it does 
need to wait for the
writes to be on stable storage before returning to the application, 
which is what the application has
requested. It does this by waiting for the write to complete and then 
flushing the disk write cache.
If the write cache is battery backed for all zpool devices then the 
global zfs_nocacheflush can be set
to give dramatically better performance.
  Try to disable it:

 http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Disabling_the_ZIL_.28Don.27t.29

 Since disabling it is not recommended, if you find out it is the cause of your
 perf problems, you should instead try to use a SLOG (separate intent log, see
 above link). Unfortunately your OS version (Solaris 10 8/07) doesn't support
 SLOGs, they have only been added to OpenSolaris build snv_68:

 http://blogs.sun.com/perrin/entry/slog_blog_or_blogging_on

 -marc

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance Issue

2008-02-06 Thread Marion Hakanson
[EMAIL PROTECTED] said:
 Here are some performance numbers.  Note that, when the application server
 used a ZFS file system to save its data, the transaction took TWICE as long.
 For some reason, though, iostat is showing 5x as much disk writing (to the
 physical disks) on the ZFS partition.  Can anyone see a problem here? 

I'm not familiar with the application in use here, but your iostat numbers
remind me of something I saw during small overwrite tests on ZFS.  Even
though the test was doing only writing, because it was writing over only a
small part of existing blocks, ZFS had to read (the unchanged part of) each
old block in before writing out the changed block to a new location (COW).

This is a case where you want to set the ZFS recordsize to match your
application's typical write size, in order to avoid the read overhead
inherent in partial-block updates.  UFS by default has a smaller max
blocksize than ZFS' default 128k, so in addition to the ZIL/fsync issue
UFS will also suffer less overhead from such partial-block updates.

Again, this may not be what's going on, but it's worth checking if you
haven't already done so.

Regards,

Marion


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance Issue

2008-02-06 Thread Marc Bevand
Neil Perrin Neil.Perrin at Sun.COM writes:
 
 The ZIL doesn't do a lot of extra IO. It usually just does one write per 
 synchronous request and will batch up multiple writes into the same log
 block if possible.

Ok. I was wrong then. Well, William, I think Marion Hakanson has the
most plausible explanation. As he suggests, experiment with zfs set
recordsize=XXX to force the filesystem to use small records. See
the zfs(1) manpage.

-marc

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS Performance Issue

2008-02-05 Thread William Fretts-Saxton
This may not be a ZFS issue, so please bear with me!

I have 4 internal drives that I have striped/mirrored with ZFS and have an 
application server which is reading/writing to hundreds of thousands of files 
on it, thousands of files @ a time.

If 1 client uses the app server, the transaction (reading/writing to ~80 files) 
takes about 200 ms.  If I have about 80 clients attempting it @ once, it can 
sometimes take a minute or more.  I'm pretty sure its a file I/O bottleneck so 
I want to make sure ZFS is tuned properly for this kind of usage.

The only thing I could think of, so far, is to turn off ZFS compression.  Is 
there anything else I can do?  Here is my zpool iostat output:

# zpool iostat 5
   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
pool1   5.69G   266G 23 76  1.44M  2.24M
pool1   5.69G   266G 96259  5.70M  7.25M
pool1   5.69G   266G 98267  5.73M  7.32M
pool1   5.69G   266G 92253  5.76M  7.31M
pool1   5.69G   266G 90254  5.67M  7.43M

and here is regular iostat:

# iostat -xnz 5
 extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.00.20.00.1  0.0  0.00.00.3   0   0 c0t0d0
0.00.20.00.1  0.0  0.00.00.3   0   0 c0t1d0
   20.4  145.0 1315.8 3714.5  0.0  2.80.0   16.8   0  21 c0t2d0
   21.4  143.2 1380.2 3711.3  0.0  4.10.0   25.1   0  27 c0t3d0
   23.4  138.4 1509.3 3693.0  0.0  1.60.09.8   0  17 c0t4d0
   20.8  137.8 1341.6 3693.0  0.0  2.30.0   14.7   0  21 c0t5d0
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance Issue

2008-02-05 Thread William Fretts-Saxton
Some more information about the system.  NOTE: Cpu utilization never goes above 
10%.

Sun Fire v40z
4 x 2.4 GHz proc
8 GB memory
3 x 146 GB Seagate Drives (10k RPM)
1 x 146 GB Fujitsu Drive (10k RPM)
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance Issue

2008-02-05 Thread Marc Bevand
William Fretts-Saxton william.fretts.saxton at sun.com writes:
 
 Some more information about the system.  NOTE: Cpu utilization never
 goes above 10%.
 
 Sun Fire v40z
 4 x 2.4 GHz proc
 8 GB memory
 3 x 146 GB Seagate Drives (10k RPM)
 1 x 146 GB Fujitsu Drive (10k RPM)

And what version of Solaris or what build of OpenSolaris are you using ?
Do you know if your application uses synchronous I/O transactions ?
Have you tried disabling ZFS file-level prefetching (just as an
experiment) ? See:

http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#File-Level_Prefetching

-marc

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss