On Mon, Dec 9, 2013 at 7:25 AM, Saso Kiselkov <[email protected]>wrote:

> On 12/5/13, 10:24 PM, Matthew Ahrens wrote:
> >
> >
> >
> > On Wed, Dec 4, 2013 at 6:58 PM, Saso Kiselkov <[email protected]
> > <mailto:[email protected]>> wrote:
> >
> >     On 12/4/13, 11:25 PM, Matthew Ahrens wrote:
> >     > Saso, thanks for your continued work to clean this up.  Here are
> some
> >     > more comments:
> >
> >     Hi Matt,
> >
> >     Thanks for the review, responses below:
> >
> >     >  - terminology: let's use similar terms as with the intent log:
> there
> >     > are not several logs on each l2arc device.  There is one log
> >     composed of
> >     > a linked list of log blocks.  (So, s/log/log block/ in the
> >     code/comments)
> >
> >     Ok, will build this in.
> >
> >     >  - why do we need to persist the L2ARC_PERSIST_FLAGS? (e.g.
> L2CACHE -
> >     > aren't they all in l2arc)
> >
> >     I'm not sure I understand. Do you mean that we should *not* rebuild
> the
> >     buffer flags when rebuilding buffers? While it is possible that we
> can
> >     infer two of the flag values (ARC_IN_HASH_TABLE and ARC_L2CACHE), we
> >     can't certainly say what the right value for ARC_L2COMPRESS and
> >     ARC_PREFETCH should be (as these relate to the state the pool was in
> >     when the buffer got into ARC in the first place).
> >
> >
> > I'm suggesting that we rebuild the flags, but I'm not sure the benefit
> > of storing these flags on disk.  It seems like ARC_IN_HASH_TABLE and
> > ARC_L2CACHE should always be set, so don't need to be persisted.  Why
> > should ARC_PREFETCH ever be set when we reconstruct?  Can't we set
> > ARC_L2COMPRESS based on LE_GET_COMPRESS()?
>
> Reworked, now gotten rid of this.
>
> > Thankfully this is not realtime code :-)
>
> I'd contend that all code is real-time code, but that the importance of
> its real-time qualities is dependent on what it's being used for. One of
> the main reasons I got into ZFS development was to try and understand
> how to improve ZFS for video streaming uses (which is inherently
> real-time).
>
> However, I don't feel particularly strongly about this functionality. If
> you think we can safely loose it, I'll defer to your judgment.
>
> >     Unfortunately, I did experiments with that and rebuild does suffer
> quite
> >     a bit (by about an order of magnitude slower) unless you get the
> >     prefetching *just* right (i.e. keep the device busy all the time).
> >     Reconstructing one log block's buffers in memory takes a miniscule
> >     amount of time when compared to access latencies even on SSDs, so
> >     without prefetching, the rebuild will be mostly read latency
> limited. My
> >     tests seemed to indicate that an average-size L2ARC rebuild (with the
> >     prefetching that's in the webrev) should take on the order of 10-20
> >     seconds. Without that, you're looking at probably several minute's
> worth
> >     of churn on the L2ARC device.
> >
> > I must not be understanding the code then (which is why I would prefer
> > it to be simpler).  It looks like there are at most 2 concurrent i/os
> > going on, one for each of the 2 linked lists.  So how can it go more
> > than 2x the performance of a single linked list with 1 concurrent i/o?
>
> The point is to minimize the hole that's created when one read finishes
> and only then we start issuing the next one - at which point we have to
> incur another latency hit of passing through the transports to the L2ARC
> device. Using this method we always keep at least 1-2 IOs queued for the
> device, instead of 0-1 IOs.
>
> A picture is worth a thousand words:
>
>      +---> (t)
> CPU  |           /run\                  /run\                  /run\
> xfer |     /xfer/     \xfer\      /xfer/     \xfer\      /xfer/     \
> SSD  |read/                 \read/                 \read/
>
>
>      +---> (t)
> CPU  |           /run\/run\             /run\/run\             /run\/
> xfer |     /xfer/xfer/\xfer\xfer\ /xfer/xfer/\xfer\xfer\ /xfer/xfer/\
> SSD  |read/read/            \read/read/            \read/read/
>
> Now the exact proportions here will dictate how well we are going to
> saturate the device, and SSDs typically prefer queues deeper than 1, but
> it's better than a straight start-stop walk of the device (which is
> going to be much slower than the device's maximum transfer rate).
>

Right, but how could this result in >2x the performance?  As indicated by
your diagram, you are doing at most 2 reads at once (or, you are getting at
most 1 read "for free" while the CPU is busy processing the last block).
 You claimed a 10-20x speedup (I am assuming that "several" means 3).

--matt


>
> I'll try to conduct quantitative tests to make sure I'm not talking out
> of my ass here.


> >     >  - how can there be infinite loops of logs?  validity should be
> >     > determined by not straddling invalid range (between end and begin)
> >
> >     Aside from obvious manually fabricated infinite log block loops to
> >     affect a hang or DoS, buffers can loop due to design. The main
> reason is
> >     that log blocks point backwards, while our writing goes forwards. We
> >     have thus no way to back and amend "outdated" buffers. So if we get
> >     seriously unlucky, we could, potentially, write a log buffer in
> exactly
> >     the sector offset pointed to by an ancient previously written log
> block,
> >     which would then create the loop. It's extremely unlikely, but it's a
> >     possibility.
> >
> >
> > But can't you catch that by validating that the pointer does not cross
> > the invalid range of the device (l2u_start_lps[0].l2lp_daddr and
> > l2u_evict_tail)?
>
> Point taken, I've removed this from l2arc_log_blk_read() and instead
> strengthened l2arc_log_blk_ptr_valid() to check for log block pointers
> into the evicted region (between l2ad_hand and l2ad_evict).
>
> Updated webrev with all of the changes:
> http://cr.illumos.org/~webrev/skiselkov/3525_simplified/
>
> Cheers,
> --
> Saso
>
>
> -------------------------------------------
> illumos-zfs
> Archives: https://www.listbox.com/member/archive/182191/=now
> RSS Feed:
> https://www.listbox.com/member/archive/rss/182191/21635000-ebd1d460
> Modify Your Subscription:
> https://www.listbox.com/member/?member_id=21635000&id_secret=21635000-73dc201a
> Powered by Listbox: http://www.listbox.com
>
_______________________________________________
developer mailing list
[email protected]
http://lists.open-zfs.org/mailman/listinfo/developer

Reply via email to