On 12/9/13, 6:54 PM, Matthew Ahrens wrote: > > > > On Mon, Dec 9, 2013 at 7:25 AM, Saso Kiselkov <[email protected] > <mailto:[email protected]>> wrote: > > On 12/5/13, 10:24 PM, Matthew Ahrens wrote: > > > > > > > > On Wed, Dec 4, 2013 at 6:58 PM, Saso Kiselkov > <[email protected] <mailto:[email protected]> > > <mailto:[email protected] <mailto:[email protected]>>> > wrote: > > > > On 12/4/13, 11:25 PM, Matthew Ahrens wrote: > > > Saso, thanks for your continued work to clean this up. Here > are some > > > more comments: > > > > Hi Matt, > > > > Thanks for the review, responses below: > > > > > - terminology: let's use similar terms as with the intent > log: there > > > are not several logs on each l2arc device. There is one log > > composed of > > > a linked list of log blocks. (So, s/log/log block/ in the > > code/comments) > > > > Ok, will build this in. > > > > > - why do we need to persist the L2ARC_PERSIST_FLAGS? (e.g. > L2CACHE - > > > aren't they all in l2arc) > > > > I'm not sure I understand. Do you mean that we should *not* > rebuild the > > buffer flags when rebuilding buffers? While it is possible > that we can > > infer two of the flag values (ARC_IN_HASH_TABLE and > ARC_L2CACHE), we > > can't certainly say what the right value for ARC_L2COMPRESS and > > ARC_PREFETCH should be (as these relate to the state the pool > was in > > when the buffer got into ARC in the first place). > > > > > > I'm suggesting that we rebuild the flags, but I'm not sure the benefit > > of storing these flags on disk. It seems like ARC_IN_HASH_TABLE and > > ARC_L2CACHE should always be set, so don't need to be persisted. Why > > should ARC_PREFETCH ever be set when we reconstruct? Can't we set > > ARC_L2COMPRESS based on LE_GET_COMPRESS()? > > Reworked, now gotten rid of this. > > > Thankfully this is not realtime code :-) > > I'd contend that all code is real-time code, but that the importance of > its real-time qualities is dependent on what it's being used for. One of > the main reasons I got into ZFS development was to try and understand > how to improve ZFS for video streaming uses (which is inherently > real-time). > > However, I don't feel particularly strongly about this functionality. If > you think we can safely loose it, I'll defer to your judgment. > > > Unfortunately, I did experiments with that and rebuild does > suffer quite > > a bit (by about an order of magnitude slower) unless you get the > > prefetching *just* right (i.e. keep the device busy all the time). > > Reconstructing one log block's buffers in memory takes a miniscule > > amount of time when compared to access latencies even on SSDs, so > > without prefetching, the rebuild will be mostly read latency > limited. My > > tests seemed to indicate that an average-size L2ARC rebuild > (with the > > prefetching that's in the webrev) should take on the order of > 10-20 > > seconds. Without that, you're looking at probably several > minute's worth > > of churn on the L2ARC device. > > > > I must not be understanding the code then (which is why I would prefer > > it to be simpler). It looks like there are at most 2 concurrent i/os > > going on, one for each of the 2 linked lists. So how can it go more > > than 2x the performance of a single linked list with 1 concurrent i/o? > > The point is to minimize the hole that's created when one read finishes > and only then we start issuing the next one - at which point we have to > incur another latency hit of passing through the transports to the L2ARC > device. Using this method we always keep at least 1-2 IOs queued for the > device, instead of 0-1 IOs. > > A picture is worth a thousand words: > > +---> (t) > CPU | /run\ /run\ /run\ > xfer | /xfer/ \xfer\ /xfer/ \xfer\ /xfer/ \ > SSD |read/ \read/ \read/ > > > +---> (t) > CPU | /run\/run\ /run\/run\ /run\/ > xfer | /xfer/xfer/\xfer\xfer\ /xfer/xfer/\xfer\xfer\ /xfer/xfer/\ > SSD |read/read/ \read/read/ \read/read/ > > Now the exact proportions here will dictate how well we are going to > saturate the device, and SSDs typically prefer queues deeper than 1, but > it's better than a straight start-stop walk of the device (which is > going to be much slower than the device's maximum transfer rate). > > > Right, but how could this result in >2x the performance? As indicated > by your diagram, you are doing at most 2 reads at once (or, you are > getting at most 1 read "for free" while the CPU is busy processing the > last block). You claimed a 10-20x speedup (I am assuming that "several" > means 3).
As I said, I'm gonna have to recheck, it's possible I might be remembering stuff incorrectly. However, currently my possibilities for performance testing are somewhat limited. I'll get back to you as soon as I have more info. Cheers, -- Saso _______________________________________________ developer mailing list [email protected] http://lists.open-zfs.org/mailman/listinfo/developer
