On Wed, Dec 4, 2013 at 6:58 PM, Saso Kiselkov <[email protected]>wrote:
> On 12/4/13, 11:25 PM, Matthew Ahrens wrote: > > Saso, thanks for your continued work to clean this up. Here are some > > more comments: > > Hi Matt, > > Thanks for the review, responses below: > > > - terminology: let's use similar terms as with the intent log: there > > are not several logs on each l2arc device. There is one log composed of > > a linked list of log blocks. (So, s/log/log block/ in the code/comments) > > Ok, will build this in. > > > - why do we need to persist the L2ARC_PERSIST_FLAGS? (e.g. L2CACHE - > > aren't they all in l2arc) > > I'm not sure I understand. Do you mean that we should *not* rebuild the > buffer flags when rebuilding buffers? While it is possible that we can > infer two of the flag values (ARC_IN_HASH_TABLE and ARC_L2CACHE), we > can't certainly say what the right value for ARC_L2COMPRESS and > ARC_PREFETCH should be (as these relate to the state the pool was in > when the buffer got into ARC in the first place). > I'm suggesting that we rebuild the flags, but I'm not sure the benefit of storing these flags on disk. It seems like ARC_IN_HASH_TABLE and ARC_L2CACHE should always be set, so don't need to be persisted. Why should ARC_PREFETCH ever be set when we reconstruct? Can't we set ARC_L2COMPRESS based on LE_GET_COMPRESS()? > > > - L2ARC_CHECK_REBUILD_TIMEOUT - unused macro > > Oops, a bit of obsolete code that I forgot to cut out after the first > round of refactoring. > > > - l2arc_rebuild_timeout: comment implies that rebuild happens before > > import completes; i thought it rebuilt in background? > > While rebuild does happen in the background, it is quite a heavy > operation that can make the system rather busy, so I wanted to make sure > it is bounded. I've coded a non-trivial amount of real-time code in the > past and it's just become kind of a reflex for me. > Thankfully this is not realtime code :-) > > > - calling it an uberblock is confusing (e.g. line 4415); maybe > > l2arc_dev_hdr_t? > > Ok. > > > - reconstruct time is not that critical. We could simplify the code and > > on-disk format, and make it easier to convince ourselves that the code > > is correct, if we remove the prefetch & double pointer stuff from > > reconstruct code. It should be sufficient to issue one i/o at a time > > (rather than 2), perhaps issuing the i/o for the next zio while creating > > l2arc_hdr_t’s for this blocks' entries. > > Unfortunately, I did experiments with that and rebuild does suffer quite > a bit (by about an order of magnitude slower) unless you get the > prefetching *just* right (i.e. keep the device busy all the time). > Reconstructing one log block's buffers in memory takes a miniscule > amount of time when compared to access latencies even on SSDs, so > without prefetching, the rebuild will be mostly read latency limited. My > tests seemed to indicate that an average-size L2ARC rebuild (with the > prefetching that's in the webrev) should take on the order of 10-20 > seconds. Without that, you're looking at probably several minute's worth > of churn on the L2ARC device. > I must not be understanding the code then (which is why I would prefer it to be simpler). It looks like there are at most 2 concurrent i/os going on, one for each of the 2 linked lists. So how can it go more than 2x the performance of a single linked list with 1 concurrent i/o? > > > - how can there be infinite loops of logs? validity should be > > determined by not straddling invalid range (between end and begin) > > Aside from obvious manually fabricated infinite log block loops to > affect a hang or DoS, buffers can loop due to design. The main reason is > that log blocks point backwards, while our writing goes forwards. We > have thus no way to back and amend "outdated" buffers. So if we get > seriously unlucky, we could, potentially, write a log buffer in exactly > the sector offset pointed to by an ancient previously written log block, > which would then create the loop. It's extremely unlikely, but it's a > possibility. > But can't you catch that by validating that the pointer does not cross the invalid range of the device (l2u_start_lps[0].l2lp_daddr and l2u_evict_tail)? > > > - Consider storing the “uberblock” in the MOS - then don’t have to > > worry about atomic overwrite, self-checksum, magic number. Update the > > in-memory version as we do now, and just write it into the MOS every txg > > (if it's dirty). > > While it would remove the aspects you mention, I suspect it wouldn't > really simplify the code. First, we'd need to dump a bunch of callbacks > in txg commit places to make sure we check for changes. Next, we'd need > to construct an even more complicated data structure in the MOS, simply > by virtue of having the possibility of having multiple L2ARC devices. > Finally, I'd have to hook into places where "zpool add" and "zpool > remove" add L2ARC vdevs. While none of these are particularly hard, I'm > not convinced it'd be worthwhile simply to replace two 50-odd lines long > functions. We already store a bit of metadata in the vdev config tree > that tells us whether we're supposed to be looking for an uberblock or not. > > Anyway, gonna give it some thought and try to estimate the complexity - > I might be completely off here. > You might be right. Also it may be that keeping it in the MOS would increase complexity because then the l2uberblock (aka l2arc_dev_hdr_t) would not reflect new writes that had happened, which potentially could overwrite data that the MOS's uberblock thinks are valid. > > Appreciate you taking the time to review and looking forward to your > responses. > > Cheers, > -- > Saso > > > ------------------------------------------- > illumos-zfs > Archives: https://www.listbox.com/member/archive/182191/=now > RSS Feed: > https://www.listbox.com/member/archive/rss/182191/21635000-ebd1d460 > Modify Your Subscription: > https://www.listbox.com/member/?member_id=21635000&id_secret=21635000-73dc201a > Powered by Listbox: http://www.listbox.com >
_______________________________________________ developer mailing list [email protected] http://lists.open-zfs.org/mailman/listinfo/developer
