comment below...

On Sep 23, 2009, at 10:00 PM, James Lever wrote:

On 08/09/2009, at 2:01 AM, Ross Walker wrote:
On Sep 7, 2009, at 1:32 AM, James Lever <j...@jamver.id.au> wrote:

Well a MD1000 holds 15 drives a good compromise might be 2 7 drive RAIDZ2s with a hotspare... That should provide 320 IOPS instead of 160, big difference.

The issue is interactive responsiveness and if there is a way to tune the system to give that while still having good performance for builds when they are run.

Look at the write IOPS of the pool with the zpool iostat -v and look at how many are happening on the RAIDZ2 vdev.

I was suggesting that slog write were possibly starving reads from the l2arc as they were on the same device. This appears not to have been the issue as the problem has persisted even with the l2arc devices removed from the pool.

The SSD will handle a lot more IOPS then the pool and L2ARC is a lazy reader, it mostly just holds on to read cache data.

It just may be that the pool configuration just can't handle the write IOPS needed and reads are starving.

Possible, but hard to tell. Have a look at the iostat results I’ve posted.

The busy times of the disks while the issue is occurring should let you know.

So it turns out that the problem is that all writes coming via NFS are going through the slog. When that happens, the transfer speed to the device drops to ~70MB/s (the write speed of his SLC SSD) and until the load drops all new write requests are blocked causing a noticeable delay (which has been observed to be up to 20s, but generally only 2-4s).

Thank you sir, can I have another?
If you add (not attach) more slogs, the workload will be spread across them. But...


I can reproduce this behaviour by copying a large file (hundreds of MB in size) using 'cp src dst’ on an NFS (still currently v3) client and observe that all data is pushed through the slog device (10GB partition of a Samsung 50GB SSD behind a PERC 6/i w/256MB BBC) rather than going direct to the primary storage disks.

On a related note, I had 2 of these devices (both using just 10GB partitions) connected as log devices (so the pool had 2 separate log devices) and the second one was consistently running significantly slower than the first. Removing the second device made an improvement on performance, but did not remove the occasional observed pauses.

...this is not surprising, when you add a slow slog device. This is the weakest link rule.

I was of the (mis)understanding that only metadata and writes smaller than 64k went via the slog device in the event of an O_SYNC write request?

The threshold is 32 kBytes, which is unfortunately the same as the default
NFS write size. See CR6686887
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6686887

If you have a slog and logbias=latency (default) then the writes go to the slog. So there is some interaction here that can affect NFS workloads in particular.


The clients are (mostly) RHEL5.

Is there a way to tune this on the NFS server or clients such that when I perform a large synchronous write, the data does not go via the slog device?

You can change the IOP size on the client.
 -- richard


I have investigated using the logbias setting, but that will just kill small file performance also on any filesystem using it and defeat the purpose of having a slog device at all.

cheers,
James

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to