On 12/9/13, 6:54 PM, Matthew Ahrens wrote:
> 
> 
> 
> On Mon, Dec 9, 2013 at 7:25 AM, Saso Kiselkov <[email protected]
> <mailto:[email protected]>> wrote:
> 
>     On 12/5/13, 10:24 PM, Matthew Ahrens wrote:
>     >
>     >
>     >
>     > On Wed, Dec 4, 2013 at 6:58 PM, Saso Kiselkov
>     <[email protected] <mailto:[email protected]>
>     > <mailto:[email protected] <mailto:[email protected]>>>
>     wrote:
>     >
>     >     On 12/4/13, 11:25 PM, Matthew Ahrens wrote:
>     >     > Saso, thanks for your continued work to clean this up.  Here
>     are some
>     >     > more comments:
>     >
>     >     Hi Matt,
>     >
>     >     Thanks for the review, responses below:
>     >
>     >     >  - terminology: let's use similar terms as with the intent
>     log: there
>     >     > are not several logs on each l2arc device.  There is one log
>     >     composed of
>     >     > a linked list of log blocks.  (So, s/log/log block/ in the
>     >     code/comments)
>     >
>     >     Ok, will build this in.
>     >
>     >     >  - why do we need to persist the L2ARC_PERSIST_FLAGS? (e.g.
>     L2CACHE -
>     >     > aren't they all in l2arc)
>     >
>     >     I'm not sure I understand. Do you mean that we should *not*
>     rebuild the
>     >     buffer flags when rebuilding buffers? While it is possible
>     that we can
>     >     infer two of the flag values (ARC_IN_HASH_TABLE and
>     ARC_L2CACHE), we
>     >     can't certainly say what the right value for ARC_L2COMPRESS and
>     >     ARC_PREFETCH should be (as these relate to the state the pool
>     was in
>     >     when the buffer got into ARC in the first place).
>     >
>     >
>     > I'm suggesting that we rebuild the flags, but I'm not sure the benefit
>     > of storing these flags on disk.  It seems like ARC_IN_HASH_TABLE and
>     > ARC_L2CACHE should always be set, so don't need to be persisted.  Why
>     > should ARC_PREFETCH ever be set when we reconstruct?  Can't we set
>     > ARC_L2COMPRESS based on LE_GET_COMPRESS()?
> 
>     Reworked, now gotten rid of this.
> 
>     > Thankfully this is not realtime code :-)
> 
>     I'd contend that all code is real-time code, but that the importance of
>     its real-time qualities is dependent on what it's being used for. One of
>     the main reasons I got into ZFS development was to try and understand
>     how to improve ZFS for video streaming uses (which is inherently
>     real-time).
> 
>     However, I don't feel particularly strongly about this functionality. If
>     you think we can safely loose it, I'll defer to your judgment.
> 
>     >     Unfortunately, I did experiments with that and rebuild does
>     suffer quite
>     >     a bit (by about an order of magnitude slower) unless you get the
>     >     prefetching *just* right (i.e. keep the device busy all the time).
>     >     Reconstructing one log block's buffers in memory takes a miniscule
>     >     amount of time when compared to access latencies even on SSDs, so
>     >     without prefetching, the rebuild will be mostly read latency
>     limited. My
>     >     tests seemed to indicate that an average-size L2ARC rebuild
>     (with the
>     >     prefetching that's in the webrev) should take on the order of
>     10-20
>     >     seconds. Without that, you're looking at probably several
>     minute's worth
>     >     of churn on the L2ARC device.
>     >
>     > I must not be understanding the code then (which is why I would prefer
>     > it to be simpler).  It looks like there are at most 2 concurrent i/os
>     > going on, one for each of the 2 linked lists.  So how can it go more
>     > than 2x the performance of a single linked list with 1 concurrent i/o?
> 
>     The point is to minimize the hole that's created when one read finishes
>     and only then we start issuing the next one - at which point we have to
>     incur another latency hit of passing through the transports to the L2ARC
>     device. Using this method we always keep at least 1-2 IOs queued for the
>     device, instead of 0-1 IOs.
> 
>     A picture is worth a thousand words:
> 
>          +---> (t)
>     CPU  |           /run\                  /run\                  /run\
>     xfer |     /xfer/     \xfer\      /xfer/     \xfer\      /xfer/     \
>     SSD  |read/                 \read/                 \read/
> 
> 
>          +---> (t)
>     CPU  |           /run\/run\             /run\/run\             /run\/
>     xfer |     /xfer/xfer/\xfer\xfer\ /xfer/xfer/\xfer\xfer\ /xfer/xfer/\
>     SSD  |read/read/            \read/read/            \read/read/
> 
>     Now the exact proportions here will dictate how well we are going to
>     saturate the device, and SSDs typically prefer queues deeper than 1, but
>     it's better than a straight start-stop walk of the device (which is
>     going to be much slower than the device's maximum transfer rate).
> 
> 
> Right, but how could this result in >2x the performance?  As indicated
> by your diagram, you are doing at most 2 reads at once (or, you are
> getting at most 1 read "for free" while the CPU is busy processing the
> last block).  You claimed a 10-20x speedup (I am assuming that "several"
> means 3).

As I said, I'm gonna have to recheck, it's possible I might be
remembering stuff incorrectly. However, currently my possibilities for
performance testing are somewhat limited. I'll get back to you as soon
as I have more info.

Cheers,
-- 
Saso
_______________________________________________
developer mailing list
[email protected]
http://lists.open-zfs.org/mailman/listinfo/developer

Reply via email to