On Fri, Sep 25, 2009 at 1:39 PM, Richard Elling
<richard.ell...@gmail.com> wrote:
> On Sep 25, 2009, at 9:14 AM, Ross Walker wrote:
>
>> On Fri, Sep 25, 2009 at 11:34 AM, Bob Friesenhahn
>> <bfrie...@simple.dallas.tx.us> wrote:
>>>
>>> On Fri, 25 Sep 2009, Ross Walker wrote:
>>>>
>>>> As a side an slog device will not be too beneficial for large
>>>> sequential writes, because it will be throughput bound not latency
>>>> bound. slog devices really help when you have lots of small sync
>>>> writes. A RAIDZ2 with the ZIL spread across it will provide much
>>>
>>> Surely this depends on the origin of the large sequential writes.  If the
>>> origin is NFS and the SSD has considerably more sustained write bandwidth
>>> than the ethernet transfer bandwidth, then using the SSD is a win.  If
>>> the SSD accepts data slower than the ethernet can deliver it (which seems to
>>> be this particular case) then the SSD is not helping.
>>>
>>> If the ethernet can pass 100MB/second, then the sustained write
>>> specification for the SSD needs to be at least 100MB/second.  Since data
>>> is buffered in the Ethernet,TCP/IP,NFS stack prior to sending it to ZFS, the
>>> SSD should support write bursts of at least double that or else it will
>>> not be helping bulk-write performance.
>>
>> Specifically I was talking NFS as that was what the OP was talking
>> about, but yes it does depend on the origin, but you also assume that
>> NFS IO goes over only a single 1Gbe interface when it could be over
>> multiple 1Gbe interfaces or a 10Gbe interface or even multple 10Gbe
>> interfaces. You also assume the IO recorded in the ZIL is just the raw
>> IO when there is also meta-data or multiple transaction copies as
>> well.
>>
>> Personnally I still prefer to spread the ZIL across the pool and have
>> a large NVRAM backed HBA as opposed to an slog which really puts all
>> my IO in one basket. If I had a pure NVRAM device I might consider
>> using that as an slog device, but SSDs are too variable for my taste.
>
> Back of the envelope math says:
>        10 Gbe = ~1 GByte/sec of I/O capacity
>
> If the SSD can only sink 70 MByte/s, then you will need:
>        int(1000/70) + 1 = 15 SSDs for the slog
>
> For capacity, you need:
>        1 GByte/sec * 30 sec = 30 GBytes

Where did the 30 seconds come in here?

The amount of time to hold cache depends on how fast you can fill it.

> Ross' idea has merit, if the size of the NVRAM in the array is 30 GBytes
> or so.

I'm thinking you can do less if you don't need to hold it for 30 seconds.

> Both of the above assume there is lots of memory in the server.
> This is increasingly becoming easier to do as the memory costs
> come down and you can physically fit 512 GBytes in a 4u server.
> By default, the txg commit will occur when 1/8 of memory is used
> for writes. For 30 GBytes, that would mean a main memory of only
> 240 Gbytes... feasible for modern servers.
>
> However, most folks won't stomach 15 SSDs for slog or 30 GBytes of
> NVRAM in their arrays. So Bob's recommendation of reducing the
> txg commit interval below 30 seconds also has merit.  Or, to put it
> another way, the dynamic sizing of the txg commit interval isn't
> quite perfect yet. [Cue for Neil to chime in... :-)]

I'm sorry did I miss something Bob said about the txg commit interval?

I looked back and didn't see it, maybe it was off-list?

-Ross
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to