On Sep 25, 2009, at 9:14 AM, Ross Walker wrote:

On Fri, Sep 25, 2009 at 11:34 AM, Bob Friesenhahn
<bfrie...@simple.dallas.tx.us> wrote:
On Fri, 25 Sep 2009, Ross Walker wrote:

As a side an slog device will not be too beneficial for large
sequential writes, because it will be throughput bound not latency
bound. slog devices really help when you have lots of small sync
writes. A RAIDZ2 with the ZIL spread across it will provide much

Surely this depends on the origin of the large sequential writes. If the origin is NFS and the SSD has considerably more sustained write bandwidth than the ethernet transfer bandwidth, then using the SSD is a win. If the SSD accepts data slower than the ethernet can deliver it (which seems to be
this particular case) then the SSD is not helping.

If the ethernet can pass 100MB/second, then the sustained write
specification for the SSD needs to be at least 100MB/second. Since data is buffered in the Ethernet,TCP/IP,NFS stack prior to sending it to ZFS, the SSD should support write bursts of at least double that or else it will not
be helping bulk-write performance.

Specifically I was talking NFS as that was what the OP was talking
about, but yes it does depend on the origin, but you also assume that
NFS IO goes over only a single 1Gbe interface when it could be over
multiple 1Gbe interfaces or a 10Gbe interface or even multple 10Gbe
interfaces. You also assume the IO recorded in the ZIL is just the raw
IO when there is also meta-data or multiple transaction copies as
well.

Personnally I still prefer to spread the ZIL across the pool and have
a large NVRAM backed HBA as opposed to an slog which really puts all
my IO in one basket. If I had a pure NVRAM device I might consider
using that as an slog device, but SSDs are too variable for my taste.

Back of the envelope math says:
        10 Gbe = ~1 GByte/sec of I/O capacity

If the SSD can only sink 70 MByte/s, then you will need:
        int(1000/70) + 1 = 15 SSDs for the slog

For capacity, you need:
        1 GByte/sec * 30 sec = 30 GBytes

Ross' idea has merit, if the size of the NVRAM in the array is 30 GBytes
or so.

Both of the above assume there is lots of memory in the server.
This is increasingly becoming easier to do as the memory costs
come down and you can physically fit 512 GBytes in a 4u server.
By default, the txg commit will occur when 1/8 of memory is used
for writes. For 30 GBytes, that would mean a main memory of only
240 Gbytes... feasible for modern servers.

However, most folks won't stomach 15 SSDs for slog or 30 GBytes of
NVRAM in their arrays. So Bob's recommendation of reducing the
txg commit interval below 30 seconds also has merit.  Or, to put it
another way, the dynamic sizing of the txg commit interval isn't
quite perfect yet. [Cue for Neil to chime in... :-)]
 -- richard


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to