Re: [zfs-discuss] slog tests on read throughput exhaustion (NFS)

2007-11-19 Thread Roch - PAE
Neil Perrin writes:
  
  
  Joe Little wrote:
   On Nov 16, 2007 9:13 PM, Neil Perrin [EMAIL PROTECTED] wrote:
   Joe,
  
   I don't think adding a slog helped in this case. In fact I
   believe it made performance worse. Previously the ZIL would be
   spread out over all devices but now all synchronous traffic
   is directed at one device (and everything is synchronous in NFS).
   Mind you 15MB/s seems a bit on the slow side - especially is
   cache flushing is disabled.
  
   It would be interesting to see what all the threads are waiting
   on. I think the problem maybe that everything is backed
   up waiting to start a transaction because the txg train is
   slow due to NFS requiring the ZIL to push everything synchronously.
  
   
   I agree completely. The log (even though slow) was an attempt to
   isolate writes away from the pool. I guess the question is how to
   provide for async access for NFS. We may have 16, 32 or whatever
   threads, but if a single writer keeps the ZIL pegged and prohibiting
   reads, its all for nought. Is there anyway to tune/configure the
   ZFS/NFS combination to balance reads/writes to not starve one for the
   other. Its either feast or famine or so tests have shown.
  
  No there's no way currently to give reads preference over writes.
  All transactions get equal priority to enter a transaction group.
  Three txgs can be outstanding as we use a 3 phase commit model:
  open; quiescing; and syncing.

That makes me wonder if this is not just the lack of write
throttling issue. If one txg is syncing and the other is
quiesced out, I think it means we have let in too many
writes. We do need a better balance.

Neil is  it correct that  reads never hit txg_wait_open(), but
they just need an I/O scheduler slot ?

If so seems to me just a matter of 

6429205 each zpool needs to monitor it's throughput and throttle heavy 
writers

However, if this is it, disabling the zil would not solve the
issue (it might even make it worse). So I am lost as to
what could be blocking the reads other than lack of I/O
slots. As another way to improve I/O scheduler we have :


6471212 need reserved I/O scheduler slots to improve I/O latency of 
critical ops



-r

  
  Neil.
  
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slog tests on read throughput exhaustion (NFS)

2007-11-19 Thread Neil Perrin


Roch - PAE wrote:
 Neil Perrin writes:
   
   
   Joe Little wrote:
On Nov 16, 2007 9:13 PM, Neil Perrin [EMAIL PROTECTED] wrote:
Joe,
   
I don't think adding a slog helped in this case. In fact I
believe it made performance worse. Previously the ZIL would be
spread out over all devices but now all synchronous traffic
is directed at one device (and everything is synchronous in NFS).
Mind you 15MB/s seems a bit on the slow side - especially is
cache flushing is disabled.
   
It would be interesting to see what all the threads are waiting
on. I think the problem maybe that everything is backed
up waiting to start a transaction because the txg train is
slow due to NFS requiring the ZIL to push everything synchronously.
   

I agree completely. The log (even though slow) was an attempt to
isolate writes away from the pool. I guess the question is how to
provide for async access for NFS. We may have 16, 32 or whatever
threads, but if a single writer keeps the ZIL pegged and prohibiting
reads, its all for nought. Is there anyway to tune/configure the
ZFS/NFS combination to balance reads/writes to not starve one for the
other. Its either feast or famine or so tests have shown.
   
   No there's no way currently to give reads preference over writes.
   All transactions get equal priority to enter a transaction group.
   Three txgs can be outstanding as we use a 3 phase commit model:
   open; quiescing; and syncing.
 
 That makes me wonder if this is not just the lack of write
 throttling issue. If one txg is syncing and the other is
 quiesced out, I think it means we have let in too many
 writes. We do need a better balance.
 
 Neil is  it correct that  reads never hit txg_wait_open(), but
 they just need an I/O scheduler slot ?

Yes, they don't modify any meta data (except access time which is
handled separately). I'm less clear about what happens further
down in the DMU and SPA.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slog tests on read throughput exhaustion (NFS)

2007-11-18 Thread Richard Elling
one more thing...

Joe Little wrote:
 I have historically noticed that in ZFS, when ever there is a heavy
 writer to a pool via NFS, the reads can held back (basically paused).
 An example is a RAID10 pool of 6 disks, whereby a directory of files
 including some large 100+MB in size being written can cause other
 clients over NFS to pause for seconds (5-30 or so). This on B70 bits.
 I've gotten used to this behavior over NFS, but didn't see it perform
 as such when on the server itself doing similar actions.
 
 To improve upon the situation, I thought perhaps I could dedicate a
 log device outside the pool, in the hopes that while heavy writes went
 to the log device, reads would merrily be allowed to coexist from the
 pool itself. My test case isn't ideal per se, but I added a local 9GB
 SCSI (80) drive for a log, and added to LUNs for the pool itself.
 You'll see from the below that while the log device is pegged at
 15MB/sec (sd5),  my directory list request on devices sd15 and sd16
 never are answered. I tried this with both no-cache-flush enabled and
 off, with negligible difference. Is there anyway to force a better
 balance of reads/writes during heavy writes?
 
  extended device statistics
 devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
 fd0   0.00.00.00.0  0.0  0.00.0   0   0
 sd0   0.00.00.00.0  0.0  0.00.0   0   0
 sd1   0.00.00.00.0  0.0  0.00.0   0   0
 sd2   0.00.00.00.0  0.0  0.00.0   0   0
 sd3   0.00.00.00.0  0.0  0.00.0   0   0
 sd4   0.00.00.00.0  0.0  0.00.0   0   0
 sd5   0.0  118.00.0 15099.9  0.0 35.0  296.7   0 100

When you see actv = 35 and svc_t  ~20, then it is possible that
you can improve performance by reducing the zfs_vdev_max_pending
queue depth.  See
http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Device_I.2FO_Queue_Size_.28I.2FO_Concurrency.29

This will be particularly true for JBODs.

Doing a little math, there is ~ 4.5 MBytes queued in the drive
waiting to be written.  4.5 MBytes isn't much for a typical RAID
array, but for a disk, it is often a sizeable chunk of its
available cache.  A 9 GByte disk, being rather old, has a pretty
wimpy microprocessor, so you are basically beating the poor thing
senseless.  Reducing the queue depth will allow the disk to perform
more efficiently.
  -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slog tests on read throughput exhaustion (NFS)

2007-11-18 Thread Joe Little
On Nov 18, 2007 1:44 PM, Richard Elling [EMAIL PROTECTED] wrote:
 one more thing...


 Joe Little wrote:
  I have historically noticed that in ZFS, when ever there is a heavy
  writer to a pool via NFS, the reads can held back (basically paused).
  An example is a RAID10 pool of 6 disks, whereby a directory of files
  including some large 100+MB in size being written can cause other
  clients over NFS to pause for seconds (5-30 or so). This on B70 bits.
  I've gotten used to this behavior over NFS, but didn't see it perform
  as such when on the server itself doing similar actions.
 
  To improve upon the situation, I thought perhaps I could dedicate a
  log device outside the pool, in the hopes that while heavy writes went
  to the log device, reads would merrily be allowed to coexist from the
  pool itself. My test case isn't ideal per se, but I added a local 9GB
  SCSI (80) drive for a log, and added to LUNs for the pool itself.
  You'll see from the below that while the log device is pegged at
  15MB/sec (sd5),  my directory list request on devices sd15 and sd16
  never are answered. I tried this with both no-cache-flush enabled and
  off, with negligible difference. Is there anyway to force a better
  balance of reads/writes during heavy writes?
 
   extended device statistics
  devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
  fd0   0.00.00.00.0  0.0  0.00.0   0   0
  sd0   0.00.00.00.0  0.0  0.00.0   0   0
  sd1   0.00.00.00.0  0.0  0.00.0   0   0
  sd2   0.00.00.00.0  0.0  0.00.0   0   0
  sd3   0.00.00.00.0  0.0  0.00.0   0   0
  sd4   0.00.00.00.0  0.0  0.00.0   0   0
  sd5   0.0  118.00.0 15099.9  0.0 35.0  296.7   0 100

 When you see actv = 35 and svc_t  ~20, then it is possible that
 you can improve performance by reducing the zfs_vdev_max_pending
 queue depth.  See
 http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Device_I.2FO_Queue_Size_.28I.2FO_Concurrency.29

 This will be particularly true for JBODs.

 Doing a little math, there is ~ 4.5 MBytes queued in the drive
 waiting to be written.  4.5 MBytes isn't much for a typical RAID
 array, but for a disk, it is often a sizeable chunk of its
 available cache.  A 9 GByte disk, being rather old, has a pretty
 wimpy microprocessor, so you are basically beating the poor thing
 senseless.  Reducing the queue depth will allow the disk to perform
 more efficiently.

I'll be trying an 18G 10K drive tomorrow. Again the test was simply to
see if by having a slog, I'd enable NFS to allow for concurrent reads
and writes. Especially in the iscsi case, but even in jbod, I find
_any_ heavy writing to completely postpone reads to NFS clients. This
makes ZFS and NFS impractical under i/o duress. My just was to simply
see how things work. It appears from Neil that it won't, and the
synchronicity RFE per ZFS filesystem is what is needed, or at least
zil_disable for NFS to be practically used currently.

As for the max_pending, I did try to lower that w/o any success (for
values of 10 and 20) in a JBOD.


   -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slog tests on read throughput exhaustion (NFS)

2007-11-17 Thread Joe Little
On Nov 16, 2007 10:41 PM, Neil Perrin [EMAIL PROTECTED] wrote:


 Joe Little wrote:
  On Nov 16, 2007 9:13 PM, Neil Perrin [EMAIL PROTECTED] wrote:
  Joe,
 
  I don't think adding a slog helped in this case. In fact I
  believe it made performance worse. Previously the ZIL would be
  spread out over all devices but now all synchronous traffic
  is directed at one device (and everything is synchronous in NFS).
  Mind you 15MB/s seems a bit on the slow side - especially is
  cache flushing is disabled.
 
  It would be interesting to see what all the threads are waiting
  on. I think the problem maybe that everything is backed
  up waiting to start a transaction because the txg train is
  slow due to NFS requiring the ZIL to push everything synchronously.
 
 
  I agree completely. The log (even though slow) was an attempt to
  isolate writes away from the pool. I guess the question is how to
  provide for async access for NFS. We may have 16, 32 or whatever
  threads, but if a single writer keeps the ZIL pegged and prohibiting
  reads, its all for nought. Is there anyway to tune/configure the
  ZFS/NFS combination to balance reads/writes to not starve one for the
  other. Its either feast or famine or so tests have shown.

 No there's no way currently to give reads preference over writes.
 All transactions get equal priority to enter a transaction group.
 Three txgs can be outstanding as we use a 3 phase commit model:
 open; quiescing; and syncing.


anyway to improve the balance? Is would appear that zil_disable is
still a requirement to get NFS to behave in an practical real world
way with ZFS still. Even with zil_disable, we end up with periods of
pausing on the heaviest of writes, and then I think its mostly just
ZFS having too much outstanding i/o to commit.

If zil_disable is enabled, is the slog disk ignored?

 Neil.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] slog tests on read throughput exhaustion (NFS)

2007-11-16 Thread Joe Little
I have historically noticed that in ZFS, when ever there is a heavy
writer to a pool via NFS, the reads can held back (basically paused).
An example is a RAID10 pool of 6 disks, whereby a directory of files
including some large 100+MB in size being written can cause other
clients over NFS to pause for seconds (5-30 or so). This on B70 bits.
I've gotten used to this behavior over NFS, but didn't see it perform
as such when on the server itself doing similar actions.

To improve upon the situation, I thought perhaps I could dedicate a
log device outside the pool, in the hopes that while heavy writes went
to the log device, reads would merrily be allowed to coexist from the
pool itself. My test case isn't ideal per se, but I added a local 9GB
SCSI (80) drive for a log, and added to LUNs for the pool itself.
You'll see from the below that while the log device is pegged at
15MB/sec (sd5),  my directory list request on devices sd15 and sd16
never are answered. I tried this with both no-cache-flush enabled and
off, with negligible difference. Is there anyway to force a better
balance of reads/writes during heavy writes?

 extended device statistics
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
fd0   0.00.00.00.0  0.0  0.00.0   0   0
sd0   0.00.00.00.0  0.0  0.00.0   0   0
sd1   0.00.00.00.0  0.0  0.00.0   0   0
sd2   0.00.00.00.0  0.0  0.00.0   0   0
sd3   0.00.00.00.0  0.0  0.00.0   0   0
sd4   0.00.00.00.0  0.0  0.00.0   0   0
sd5   0.0  118.00.0 15099.9  0.0 35.0  296.7   0 100
sd6   0.00.00.00.0  0.0  0.00.0   0   0
sd7   0.00.00.00.0  0.0  0.00.0   0   0
sd8   0.00.00.00.0  0.0  0.00.0   0   0
sd9   0.00.00.00.0  0.0  0.00.0   0   0
sd10  0.00.00.00.0  0.0  0.00.0   0   0
sd11  0.00.00.00.0  0.0  0.00.0   0   0
sd12  0.00.00.00.0  0.0  0.00.0   0   0
sd13  0.00.00.00.0  0.0  0.00.0   0   0
sd14  0.00.00.00.0  0.0  0.00.0   0   0
sd15  0.00.00.00.0  0.0  0.00.0   0   0
sd16  0.00.00.00.0  0.0  0.00.0   0   0
 extended device statistics
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
fd0   0.00.00.00.0  0.0  0.00.0   0   0
sd0   0.00.00.00.0  0.0  0.00.0   0   0
sd1   0.00.00.00.0  0.0  0.00.0   0   0
sd2   0.00.00.00.0  0.0  0.00.0   0   0
sd3   0.00.00.00.0  0.0  0.00.0   0   0
sd4   0.00.00.00.0  0.0  0.00.0   0   0
sd5   0.0  117.00.0 14970.1  0.0 35.0  299.2   0 100
sd6   0.00.00.00.0  0.0  0.00.0   0   0
sd7   0.00.00.00.0  0.0  0.00.0   0   0
sd8   0.00.00.00.0  0.0  0.00.0   0   0
sd9   0.00.00.00.0  0.0  0.00.0   0   0
sd10  0.00.00.00.0  0.0  0.00.0   0   0
sd11  0.00.00.00.0  0.0  0.00.0   0   0
sd12  0.00.00.00.0  0.0  0.00.0   0   0
sd13  0.00.00.00.0  0.0  0.00.0   0   0
sd14  0.00.00.00.0  0.0  0.00.0   0   0
sd15  0.00.00.00.0  0.0  0.00.0   0   0
sd16  0.00.00.00.0  0.0  0.00.0   0   0
 extended device statistics
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
fd0   0.00.00.00.0  0.0  0.00.0   0   0
sd0   0.00.00.00.0  0.0  0.00.0   0   0
sd1   0.00.00.00.0  0.0  0.00.0   0   0
sd2   0.00.00.00.0  0.0  0.00.0   0   0
sd3   0.00.00.00.0  0.0  0.00.0   0   0
sd4   0.00.00.00.0  0.0  0.00.0   0   0
sd5   0.0  118.10.0 15111.9  0.0 35.0  296.4   0 100
sd6   0.00.00.00.0  0.0  0.00.0   0   0
sd7   0.00.00.00.0  0.0  0.00.0   0   0
sd8   0.00.00.00.0  0.0  0.00.0   0   0
sd9   0.00.00.00.0  0.0  0.00.0   0   0
sd10  0.00.00.00.0  0.0  0.00.0   0   0
sd11  0.00.00.00.0  0.0  0.00.0   0   0
sd12  0.00.00.00.0  0.0  0.00.0   0   0
sd13  0.00.00.00.0  0.0  0.00.0   0   0
sd14  0.00.00.00.0  0.0  0.00.0   0   0
sd15  0.00.00.00.0  0.0  0.00.0   0   0
sd16  0.00.00.00.0  0.0  0.00.0   0   0
 extended device statistics
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
fd0   0.00.00.00.0  0.0  0.00.0   0   0
sd0   0.00.00.00.0  0.0  0.00.0   0   0
sd1   0.00.00.00.0  0.0  0.00.0   0   0
sd2   0.00.00.0 

Re: [zfs-discuss] slog tests on read throughput exhaustion (NFS)

2007-11-16 Thread Neil Perrin
Joe,

I don't think adding a slog helped in this case. In fact I
believe it made performance worse. Previously the ZIL would be 
spread out over all devices but now all synchronous traffic
is directed at one device (and everything is synchronous in NFS).
Mind you 15MB/s seems a bit on the slow side - especially is
cache flushing is disabled.

It would be interesting to see what all the threads are waiting
on. I think the problem maybe that everything is backed
up waiting to start a transaction because the txg train is
slow due to NFS requiring the ZIL to push everything synchronously.

Neil.

Joe Little wrote:
 I have historically noticed that in ZFS, when ever there is a heavy
 writer to a pool via NFS, the reads can held back (basically paused).
 An example is a RAID10 pool of 6 disks, whereby a directory of files
 including some large 100+MB in size being written can cause other
 clients over NFS to pause for seconds (5-30 or so). This on B70 bits.
 I've gotten used to this behavior over NFS, but didn't see it perform
 as such when on the server itself doing similar actions.
 
 To improve upon the situation, I thought perhaps I could dedicate a
 log device outside the pool, in the hopes that while heavy writes went
 to the log device, reads would merrily be allowed to coexist from the
 pool itself. My test case isn't ideal per se, but I added a local 9GB
 SCSI (80) drive for a log, and added to LUNs for the pool itself.
 You'll see from the below that while the log device is pegged at
 15MB/sec (sd5),  my directory list request on devices sd15 and sd16
 never are answered. I tried this with both no-cache-flush enabled and
 off, with negligible difference. Is there anyway to force a better
 balance of reads/writes during heavy writes?
 
  extended device statistics
 devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
 fd0   0.00.00.00.0  0.0  0.00.0   0   0
 sd0   0.00.00.00.0  0.0  0.00.0   0   0
 sd1   0.00.00.00.0  0.0  0.00.0   0   0
 sd2   0.00.00.00.0  0.0  0.00.0   0   0
 sd3   0.00.00.00.0  0.0  0.00.0   0   0
 sd4   0.00.00.00.0  0.0  0.00.0   0   0
 sd5   0.0  118.00.0 15099.9  0.0 35.0  296.7   0 100
 sd6   0.00.00.00.0  0.0  0.00.0   0   0
 sd7   0.00.00.00.0  0.0  0.00.0   0   0
 sd8   0.00.00.00.0  0.0  0.00.0   0   0
 sd9   0.00.00.00.0  0.0  0.00.0   0   0
 sd10  0.00.00.00.0  0.0  0.00.0   0   0
 sd11  0.00.00.00.0  0.0  0.00.0   0   0
 sd12  0.00.00.00.0  0.0  0.00.0   0   0
 sd13  0.00.00.00.0  0.0  0.00.0   0   0
 sd14  0.00.00.00.0  0.0  0.00.0   0   0
 sd15  0.00.00.00.0  0.0  0.00.0   0   0
 sd16  0.00.00.00.0  0.0  0.00.0   0   0
...
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slog tests on read throughput exhaustion (NFS)

2007-11-16 Thread Joe Little
On Nov 16, 2007 9:13 PM, Neil Perrin [EMAIL PROTECTED] wrote:
 Joe,

 I don't think adding a slog helped in this case. In fact I
 believe it made performance worse. Previously the ZIL would be
 spread out over all devices but now all synchronous traffic
 is directed at one device (and everything is synchronous in NFS).
 Mind you 15MB/s seems a bit on the slow side - especially is
 cache flushing is disabled.

 It would be interesting to see what all the threads are waiting
 on. I think the problem maybe that everything is backed
 up waiting to start a transaction because the txg train is
 slow due to NFS requiring the ZIL to push everything synchronously.


I agree completely. The log (even though slow) was an attempt to
isolate writes away from the pool. I guess the question is how to
provide for async access for NFS. We may have 16, 32 or whatever
threads, but if a single writer keeps the ZIL pegged and prohibiting
reads, its all for nought. Is there anyway to tune/configure the
ZFS/NFS combination to balance reads/writes to not starve one for the
other. Its either feast or famine or so tests have shown.

 Neil.


 Joe Little wrote:
  I have historically noticed that in ZFS, when ever there is a heavy
  writer to a pool via NFS, the reads can held back (basically paused).
  An example is a RAID10 pool of 6 disks, whereby a directory of files
  including some large 100+MB in size being written can cause other
  clients over NFS to pause for seconds (5-30 or so). This on B70 bits.
  I've gotten used to this behavior over NFS, but didn't see it perform
  as such when on the server itself doing similar actions.
 
  To improve upon the situation, I thought perhaps I could dedicate a
  log device outside the pool, in the hopes that while heavy writes went
  to the log device, reads would merrily be allowed to coexist from the
  pool itself. My test case isn't ideal per se, but I added a local 9GB
  SCSI (80) drive for a log, and added to LUNs for the pool itself.
  You'll see from the below that while the log device is pegged at
  15MB/sec (sd5),  my directory list request on devices sd15 and sd16
  never are answered. I tried this with both no-cache-flush enabled and
  off, with negligible difference. Is there anyway to force a better
  balance of reads/writes during heavy writes?
 
   extended device statistics
  devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
  fd0   0.00.00.00.0  0.0  0.00.0   0   0
  sd0   0.00.00.00.0  0.0  0.00.0   0   0
  sd1   0.00.00.00.0  0.0  0.00.0   0   0
  sd2   0.00.00.00.0  0.0  0.00.0   0   0
  sd3   0.00.00.00.0  0.0  0.00.0   0   0
  sd4   0.00.00.00.0  0.0  0.00.0   0   0
  sd5   0.0  118.00.0 15099.9  0.0 35.0  296.7   0 100
  sd6   0.00.00.00.0  0.0  0.00.0   0   0
  sd7   0.00.00.00.0  0.0  0.00.0   0   0
  sd8   0.00.00.00.0  0.0  0.00.0   0   0
  sd9   0.00.00.00.0  0.0  0.00.0   0   0
  sd10  0.00.00.00.0  0.0  0.00.0   0   0
  sd11  0.00.00.00.0  0.0  0.00.0   0   0
  sd12  0.00.00.00.0  0.0  0.00.0   0   0
  sd13  0.00.00.00.0  0.0  0.00.0   0   0
  sd14  0.00.00.00.0  0.0  0.00.0   0   0
  sd15  0.00.00.00.0  0.0  0.00.0   0   0
  sd16  0.00.00.00.0  0.0  0.00.0   0   0
 ...

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slog tests on read throughput exhaustion (NFS)

2007-11-16 Thread Joe Little
On Nov 16, 2007 9:17 PM, Joe Little [EMAIL PROTECTED] wrote:
 On Nov 16, 2007 9:13 PM, Neil Perrin [EMAIL PROTECTED] wrote:
  Joe,
 
  I don't think adding a slog helped in this case. In fact I
  believe it made performance worse. Previously the ZIL would be
  spread out over all devices but now all synchronous traffic
  is directed at one device (and everything is synchronous in NFS).
  Mind you 15MB/s seems a bit on the slow side - especially is
  cache flushing is disabled.
 
  It would be interesting to see what all the threads are waiting
  on. I think the problem maybe that everything is backed
  up waiting to start a transaction because the txg train is
  slow due to NFS requiring the ZIL to push everything synchronously.
 

Roch wrote this before (thus my interest in the log or NVRAM like solution):


There are 2 independant things at play here.

a) NFS sync semantics conspire againts single thread performance with
any backend filesystem.
 However NVRAM normally offers some releaf of the issue.

b) ZFS sync semantics along with the Storage Software + imprecise
protocol in between, conspire againts ZFS performance
of some workloads on NVRAM backed storage. NFS being one of the
affected workloads.

The conjunction of the 2 causes worst than expected NFS perfomance
over ZFS backend running __on NVRAM back storage__.
If you are not considering NVRAM storage, then I know of no ZFS/NFS
specific problems.

Issue b) is being delt with, by both Solaris and Storage Vendors (we
need a refined protocol);

Issue a) is not related to ZFS and rather fundamental NFS issue.
Maybe future NFS protocol will help.


Net net; if one finds a way to 'disable cache flushing' on the
storage side, then one reaches the state
we'll be, out of the box, when b) is implemented by Solaris _and_
Storage vendor. At that point,  ZFS becomes a fine NFS
server not only on JBOD as it is today , both also on NVRAM backed
storage.

It's complex enough, I thougt it was worth repeating.




 I agree completely. The log (even though slow) was an attempt to
 isolate writes away from the pool. I guess the question is how to
 provide for async access for NFS. We may have 16, 32 or whatever
 threads, but if a single writer keeps the ZIL pegged and prohibiting
 reads, its all for nought. Is there anyway to tune/configure the
 ZFS/NFS combination to balance reads/writes to not starve one for the
 other. Its either feast or famine or so tests have shown.


  Neil.
 
 
  Joe Little wrote:
   I have historically noticed that in ZFS, when ever there is a heavy
   writer to a pool via NFS, the reads can held back (basically paused).
   An example is a RAID10 pool of 6 disks, whereby a directory of files
   including some large 100+MB in size being written can cause other
   clients over NFS to pause for seconds (5-30 or so). This on B70 bits.
   I've gotten used to this behavior over NFS, but didn't see it perform
   as such when on the server itself doing similar actions.
  
   To improve upon the situation, I thought perhaps I could dedicate a
   log device outside the pool, in the hopes that while heavy writes went
   to the log device, reads would merrily be allowed to coexist from the
   pool itself. My test case isn't ideal per se, but I added a local 9GB
   SCSI (80) drive for a log, and added to LUNs for the pool itself.
   You'll see from the below that while the log device is pegged at
   15MB/sec (sd5),  my directory list request on devices sd15 and sd16
   never are answered. I tried this with both no-cache-flush enabled and
   off, with negligible difference. Is there anyway to force a better
   balance of reads/writes during heavy writes?
  
extended device statistics
   devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
   fd0   0.00.00.00.0  0.0  0.00.0   0   0
   sd0   0.00.00.00.0  0.0  0.00.0   0   0
   sd1   0.00.00.00.0  0.0  0.00.0   0   0
   sd2   0.00.00.00.0  0.0  0.00.0   0   0
   sd3   0.00.00.00.0  0.0  0.00.0   0   0
   sd4   0.00.00.00.0  0.0  0.00.0   0   0
   sd5   0.0  118.00.0 15099.9  0.0 35.0  296.7   0 100
   sd6   0.00.00.00.0  0.0  0.00.0   0   0
   sd7   0.00.00.00.0  0.0  0.00.0   0   0
   sd8   0.00.00.00.0  0.0  0.00.0   0   0
   sd9   0.00.00.00.0  0.0  0.00.0   0   0
   sd10  0.00.00.00.0  0.0  0.00.0   0   0
   sd11  0.00.00.00.0  0.0  0.00.0   0   0
   sd12  0.00.00.00.0  0.0  0.00.0   0   0
   sd13  0.00.00.00.0  0.0  0.00.0   0   0
   sd14  0.00.00.00.0  0.0  0.00.0   0   0
   sd15  0.00.00.00.0  0.0  0.00.0   0   0
   sd16  0.00.00.00.0  0.0  0.00.0   0   0
  ...
 

___
zfs-discuss