On 12/4/13, 11:25 PM, Matthew Ahrens wrote:
> Saso, thanks for your continued work to clean this up.  Here are some
> more comments:

Hi Matt,

Thanks for the review, responses below:

>  - terminology: let's use similar terms as with the intent log: there
> are not several logs on each l2arc device.  There is one log composed of
> a linked list of log blocks.  (So, s/log/log block/ in the code/comments)

Ok, will build this in.

>  - why do we need to persist the L2ARC_PERSIST_FLAGS? (e.g. L2CACHE -
> aren't they all in l2arc)

I'm not sure I understand. Do you mean that we should *not* rebuild the
buffer flags when rebuilding buffers? While it is possible that we can
infer two of the flag values (ARC_IN_HASH_TABLE and ARC_L2CACHE), we
can't certainly say what the right value for ARC_L2COMPRESS and
ARC_PREFETCH should be (as these relate to the state the pool was in
when the buffer got into ARC in the first place).

>  - L2ARC_CHECK_REBUILD_TIMEOUT - unused macro

Oops, a bit of obsolete code that I forgot to cut out after the first
round of refactoring.

>  - l2arc_rebuild_timeout: comment implies that rebuild happens before
> import completes; i thought it rebuilt in background?

While rebuild does happen in the background, it is quite a heavy
operation that can make the system rather busy, so I wanted to make sure
it is bounded. I've coded a non-trivial amount of real-time code in the
past and it's just become kind of a reflex for me.

>  - calling it an uberblock is confusing (e.g. line 4415); maybe
> l2arc_dev_hdr_t?

Ok.

>  - reconstruct time is not that critical. We could simplify the code and
> on-disk format, and make it easier to convince ourselves that the code
> is correct, if we remove the prefetch & double pointer stuff from
> reconstruct code.  It should be sufficient to issue one i/o at a time
> (rather than 2), perhaps issuing the i/o for the next zio while creating
> l2arc_hdr_t’s for this blocks' entries.

Unfortunately, I did experiments with that and rebuild does suffer quite
a bit (by about an order of magnitude slower) unless you get the
prefetching *just* right (i.e. keep the device busy all the time).
Reconstructing one log block's buffers in memory takes a miniscule
amount of time when compared to access latencies even on SSDs, so
without prefetching, the rebuild will be mostly read latency limited. My
tests seemed to indicate that an average-size L2ARC rebuild (with the
prefetching that's in the webrev) should take on the order of 10-20
seconds. Without that, you're looking at probably several minute's worth
of churn on the L2ARC device.

>  - how can there be infinite loops of logs?  validity should be
> determined by not straddling invalid range (between end and begin)

Aside from obvious manually fabricated infinite log block loops to
affect a hang or DoS, buffers can loop due to design. The main reason is
that log blocks point backwards, while our writing goes forwards. We
have thus no way to back and amend "outdated" buffers. So if we get
seriously unlucky, we could, potentially, write a log buffer in exactly
the sector offset pointed to by an ancient previously written log block,
which would then create the loop. It's extremely unlikely, but it's a
possibility.

>  - Consider storing the  “uberblock” in the MOS - then don’t have to
> worry about atomic overwrite, self-checksum, magic number.  Update the
> in-memory version as we do now, and just write it into the MOS every txg
> (if it's dirty).

While it would remove the aspects you mention, I suspect it wouldn't
really simplify the code. First, we'd need to dump a bunch of callbacks
in txg commit places to make sure we check for changes. Next, we'd need
to construct an even more complicated data structure in the MOS, simply
by virtue of having the possibility of having multiple L2ARC devices.
Finally, I'd have to hook into places where "zpool add" and "zpool
remove" add L2ARC vdevs. While none of these are particularly hard, I'm
not convinced it'd be worthwhile simply to replace two 50-odd lines long
functions. We already store a bit of metadata in the vdev config tree
that tells us whether we're supposed to be looking for an uberblock or not.

Anyway, gonna give it some thought and try to estimate the complexity -
I might be completely off here.

Appreciate you taking the time to review and looking forward to your
responses.

Cheers,
-- 
Saso
_______________________________________________
developer mailing list
[email protected]
http://lists.open-zfs.org/mailman/listinfo/developer

Reply via email to