Re: [OpenZFS Developer] [zfs] [Review] #3525 Persistent L2ARC

Saso Kiselkov Mon, 09 Dec 2013 07:27:13 -0800

On 12/5/13, 10:24 PM, Matthew Ahrens wrote:
> 
> 
> 
> On Wed, Dec 4, 2013 at 6:58 PM, Saso Kiselkov <[email protected]
> <mailto:[email protected]>> wrote:
> 
>     On 12/4/13, 11:25 PM, Matthew Ahrens wrote:
>     > Saso, thanks for your continued work to clean this up.  Here are some
>     > more comments:
> 
>     Hi Matt,
> 
>     Thanks for the review, responses below:
> 
>     >  - terminology: let's use similar terms as with the intent log: there
>     > are not several logs on each l2arc device.  There is one log
>     composed of
>     > a linked list of log blocks.  (So, s/log/log block/ in the
>     code/comments)
> 
>     Ok, will build this in.
> 
>     >  - why do we need to persist the L2ARC_PERSIST_FLAGS? (e.g. L2CACHE -
>     > aren't they all in l2arc)
> 
>     I'm not sure I understand. Do you mean that we should *not* rebuild the
>     buffer flags when rebuilding buffers? While it is possible that we can
>     infer two of the flag values (ARC_IN_HASH_TABLE and ARC_L2CACHE), we
>     can't certainly say what the right value for ARC_L2COMPRESS and
>     ARC_PREFETCH should be (as these relate to the state the pool was in
>     when the buffer got into ARC in the first place).
> 
> 
> I'm suggesting that we rebuild the flags, but I'm not sure the benefit
> of storing these flags on disk.  It seems like ARC_IN_HASH_TABLE and
> ARC_L2CACHE should always be set, so don't need to be persisted.  Why
> should ARC_PREFETCH ever be set when we reconstruct?  Can't we set
> ARC_L2COMPRESS based on LE_GET_COMPRESS()?


Reworked, now gotten rid of this.

> Thankfully this is not realtime code :-)

I'd contend that all code is real-time code, but that the importance of
its real-time qualities is dependent on what it's being used for. One of
the main reasons I got into ZFS development was to try and understand
how to improve ZFS for video streaming uses (which is inherently real-time).

However, I don't feel particularly strongly about this functionality. If
you think we can safely loose it, I'll defer to your judgment.

>     Unfortunately, I did experiments with that and rebuild does suffer quite
>     a bit (by about an order of magnitude slower) unless you get the
>     prefetching *just* right (i.e. keep the device busy all the time).
>     Reconstructing one log block's buffers in memory takes a miniscule
>     amount of time when compared to access latencies even on SSDs, so
>     without prefetching, the rebuild will be mostly read latency limited. My
>     tests seemed to indicate that an average-size L2ARC rebuild (with the
>     prefetching that's in the webrev) should take on the order of 10-20
>     seconds. Without that, you're looking at probably several minute's worth
>     of churn on the L2ARC device.
> 
> I must not be understanding the code then (which is why I would prefer
> it to be simpler).  It looks like there are at most 2 concurrent i/os
> going on, one for each of the 2 linked lists.  So how can it go more
> than 2x the performance of a single linked list with 1 concurrent i/o?

The point is to minimize the hole that's created when one read finishes
and only then we start issuing the next one - at which point we have to
incur another latency hit of passing through the transports to the L2ARC
device. Using this method we always keep at least 1-2 IOs queued for the
device, instead of 0-1 IOs.

A picture is worth a thousand words:

     +---> (t)
CPU  |           /run\                  /run\                  /run\
xfer |     /xfer/     \xfer\      /xfer/     \xfer\      /xfer/     \
SSD  |read/                 \read/                 \read/


     +---> (t)
CPU  |           /run\/run\             /run\/run\             /run\/
xfer |     /xfer/xfer/\xfer\xfer\ /xfer/xfer/\xfer\xfer\ /xfer/xfer/\
SSD  |read/read/            \read/read/            \read/read/

Now the exact proportions here will dictate how well we are going to
saturate the device, and SSDs typically prefer queues deeper than 1, but
it's better than a straight start-stop walk of the device (which is
going to be much slower than the device's maximum transfer rate).

I'll try to conduct quantitative tests to make sure I'm not talking out
of my ass here.

>     >  - how can there be infinite loops of logs?  validity should be
>     > determined by not straddling invalid range (between end and begin)
> 
>     Aside from obvious manually fabricated infinite log block loops to
>     affect a hang or DoS, buffers can loop due to design. The main reason is
>     that log blocks point backwards, while our writing goes forwards. We
>     have thus no way to back and amend "outdated" buffers. So if we get
>     seriously unlucky, we could, potentially, write a log buffer in exactly
>     the sector offset pointed to by an ancient previously written log block,
>     which would then create the loop. It's extremely unlikely, but it's a
>     possibility.
> 
> 
> But can't you catch that by validating that the pointer does not cross
> the invalid range of the device (l2u_start_lps[0].l2lp_daddr and
> l2u_evict_tail)?

Point taken, I've removed this from l2arc_log_blk_read() and instead
strengthened l2arc_log_blk_ptr_valid() to check for log block pointers
into the evicted region (between l2ad_hand and l2ad_evict).

Updated webrev with all of the changes:
http://cr.illumos.org/~webrev/skiselkov/3525_simplified/

Cheers,
-- 
Saso
_______________________________________________
developer mailing list
[email protected]
http://lists.open-zfs.org/mailman/listinfo/developer

Re: [OpenZFS Developer] [zfs] [Review] #3525 Persistent L2ARC

Reply via email to