On Mon, Dec 9, 2013 at 7:25 AM, Saso Kiselkov <[email protected]>wrote:
> On 12/5/13, 10:24 PM, Matthew Ahrens wrote: > > > > > > > > On Wed, Dec 4, 2013 at 6:58 PM, Saso Kiselkov <[email protected] > > <mailto:[email protected]>> wrote: > > > > On 12/4/13, 11:25 PM, Matthew Ahrens wrote: > > > Saso, thanks for your continued work to clean this up. Here are > some > > > more comments: > > > > Hi Matt, > > > > Thanks for the review, responses below: > > > > > - terminology: let's use similar terms as with the intent log: > there > > > are not several logs on each l2arc device. There is one log > > composed of > > > a linked list of log blocks. (So, s/log/log block/ in the > > code/comments) > > > > Ok, will build this in. > > > > > - why do we need to persist the L2ARC_PERSIST_FLAGS? (e.g. > L2CACHE - > > > aren't they all in l2arc) > > > > I'm not sure I understand. Do you mean that we should *not* rebuild > the > > buffer flags when rebuilding buffers? While it is possible that we > can > > infer two of the flag values (ARC_IN_HASH_TABLE and ARC_L2CACHE), we > > can't certainly say what the right value for ARC_L2COMPRESS and > > ARC_PREFETCH should be (as these relate to the state the pool was in > > when the buffer got into ARC in the first place). > > > > > > I'm suggesting that we rebuild the flags, but I'm not sure the benefit > > of storing these flags on disk. It seems like ARC_IN_HASH_TABLE and > > ARC_L2CACHE should always be set, so don't need to be persisted. Why > > should ARC_PREFETCH ever be set when we reconstruct? Can't we set > > ARC_L2COMPRESS based on LE_GET_COMPRESS()? > > Reworked, now gotten rid of this. > > > Thankfully this is not realtime code :-) > > I'd contend that all code is real-time code, but that the importance of > its real-time qualities is dependent on what it's being used for. One of > the main reasons I got into ZFS development was to try and understand > how to improve ZFS for video streaming uses (which is inherently > real-time). > > However, I don't feel particularly strongly about this functionality. If > you think we can safely loose it, I'll defer to your judgment. > > > Unfortunately, I did experiments with that and rebuild does suffer > quite > > a bit (by about an order of magnitude slower) unless you get the > > prefetching *just* right (i.e. keep the device busy all the time). > > Reconstructing one log block's buffers in memory takes a miniscule > > amount of time when compared to access latencies even on SSDs, so > > without prefetching, the rebuild will be mostly read latency > limited. My > > tests seemed to indicate that an average-size L2ARC rebuild (with the > > prefetching that's in the webrev) should take on the order of 10-20 > > seconds. Without that, you're looking at probably several minute's > worth > > of churn on the L2ARC device. > > > > I must not be understanding the code then (which is why I would prefer > > it to be simpler). It looks like there are at most 2 concurrent i/os > > going on, one for each of the 2 linked lists. So how can it go more > > than 2x the performance of a single linked list with 1 concurrent i/o? > > The point is to minimize the hole that's created when one read finishes > and only then we start issuing the next one - at which point we have to > incur another latency hit of passing through the transports to the L2ARC > device. Using this method we always keep at least 1-2 IOs queued for the > device, instead of 0-1 IOs. > > A picture is worth a thousand words: > > +---> (t) > CPU | /run\ /run\ /run\ > xfer | /xfer/ \xfer\ /xfer/ \xfer\ /xfer/ \ > SSD |read/ \read/ \read/ > > > +---> (t) > CPU | /run\/run\ /run\/run\ /run\/ > xfer | /xfer/xfer/\xfer\xfer\ /xfer/xfer/\xfer\xfer\ /xfer/xfer/\ > SSD |read/read/ \read/read/ \read/read/ > > Now the exact proportions here will dictate how well we are going to > saturate the device, and SSDs typically prefer queues deeper than 1, but > it's better than a straight start-stop walk of the device (which is > going to be much slower than the device's maximum transfer rate). > Right, but how could this result in >2x the performance? As indicated by your diagram, you are doing at most 2 reads at once (or, you are getting at most 1 read "for free" while the CPU is busy processing the last block). You claimed a 10-20x speedup (I am assuming that "several" means 3). --matt > > I'll try to conduct quantitative tests to make sure I'm not talking out > of my ass here. > > > - how can there be infinite loops of logs? validity should be > > > determined by not straddling invalid range (between end and begin) > > > > Aside from obvious manually fabricated infinite log block loops to > > affect a hang or DoS, buffers can loop due to design. The main > reason is > > that log blocks point backwards, while our writing goes forwards. We > > have thus no way to back and amend "outdated" buffers. So if we get > > seriously unlucky, we could, potentially, write a log buffer in > exactly > > the sector offset pointed to by an ancient previously written log > block, > > which would then create the loop. It's extremely unlikely, but it's a > > possibility. > > > > > > But can't you catch that by validating that the pointer does not cross > > the invalid range of the device (l2u_start_lps[0].l2lp_daddr and > > l2u_evict_tail)? > > Point taken, I've removed this from l2arc_log_blk_read() and instead > strengthened l2arc_log_blk_ptr_valid() to check for log block pointers > into the evicted region (between l2ad_hand and l2ad_evict). > > Updated webrev with all of the changes: > http://cr.illumos.org/~webrev/skiselkov/3525_simplified/ > > Cheers, > -- > Saso > > > ------------------------------------------- > illumos-zfs > Archives: https://www.listbox.com/member/archive/182191/=now > RSS Feed: > https://www.listbox.com/member/archive/rss/182191/21635000-ebd1d460 > Modify Your Subscription: > https://www.listbox.com/member/?member_id=21635000&id_secret=21635000-73dc201a > Powered by Listbox: http://www.listbox.com >
_______________________________________________ developer mailing list [email protected] http://lists.open-zfs.org/mailman/listinfo/developer
