We had an intern work on "sorted scrub" last year.  Essentially the idea
was to read the metadata to gather into memory all the BP's that need to be
scrubbed, sort them by DVA (i.e. offset on disk) and then issue the scrub
i/os in that sorted order.  However, memory can't hold all of the BP's, so
we do multiple passes over the metadata, each pass gathering the next chunk
of BP's.  This code is implemented and seems to work but probably needs
some more testing and code cleanup.

One of the downsides of that approach is having to do multiple passes over
the metadata if it doesn't all fit in memory (which it typically does
not).  In some circumstances, this is worth it, but in others not so much.
To improve on that, we would like to do just one pass over the metadata to
find all the block pointers.  Rather than storing the BP's sorted in
memory, we would store them on disk, but only roughly sorted.  There are
several ways we could do the sorting, which is one of the issues that makes
this problem interesting.

We could divide each top-level vdev into chunks (like metaslabs, but
probably a different number of them) and for each chunk have an on-disk
list of BP's in that chunk that need to be scrubbed/resilvered.  When we
find a BP, we would append it to the appropriate list.  Once we have
traversed all the metadata to find all the BP's, we would load one chunk's
list of BP's into memory, sort it, and then issue the resilver i/os in
sorted order.

As an alternative, it might be better to accumulate as many BP's as fit in
memory, sort them, and then write that sorted list to disk.  Then remove
those BP's from memory and start filling memory again, write that list,
etc.  Then read all the sorted lists in parallel to do a merge sort.  This
has the advantage that we do not need to append to lots of lists as we are
traversing the metadata. Instead we have to read from lots of lists as we
do the scrubs, but this should be more efficient  We also don't have to
determine beforehand how many chunks to divide each vdev into.

If you'd like to continue working on sorted scrub along these lines, let me
know.

--matt


On Sat, Jul 9, 2016 at 7:10 AM, Gvozden Neskovic <nesko...@gmail.com> wrote:

> Dear OpenZFS developers,
>
> Since SIMD RAID-Z code was merged to ZoL [1], I started to look into the
> rest of the scrub/resilvering code path.
> I've found some existing specs and ideas about how to make the process
> more rotational drive friendly [2][3][4][5].
> What I've gathered from these is that scrub should be split to metadata
> and data traversal phases. As I'm new to ZFS,
> I've made a quick prototype simulating large elevator using AVL list to
> sort blocks by DVA offset [6]. It's probably
> broken in more than few ways, but this is just a quick hack to get a grasp
> of the code. Solution turned out similar to
> 'ASYNC_DESTROY' feature, so I'm wondering if this might be a direction to
> take?
>
> At this stage, I would appreciate any input on how to proceed with this
> project. If you're a core dev and would like
> to provide any kind of mentorship or willing to answer some questions from
> time to time, please let me know.
> Or, if there's a perfect solution for this just waiting to be implemented,
> even better.
> For starters, pointers like: read this article, make sure you understand
> this peace of code, etc., would also be very helpful.
>
> Regards,
>
> [1]
> https://github.com/zfsonlinux/zfs/commit/ab9f4b0b824ab4cc64a4fa382c037f4154de12d6
> [2] https://blogs.oracle.com/roch/entry/sequential_resilvering
> [3]
> http://wiki.old.lustre.org/images/f/ff/Rebuild_performance-2009-06-15.pdf
> [4] https://blogs.oracle.com/ahrens/entry/new_scrub_code
> [5] http://open-zfs.org/wiki/Projects#Periodic_Data_Validation
> [6]
> https://github.com/ironMann/zfs/commit/9a2ec765d2afc38ec76393dd694216fae0221443
> *openzfs-developer* | Archives
> <https://www.listbox.com/member/archive/274414/=now>
> <https://www.listbox.com/member/archive/rss/274414/28015287-49e52ff8> |
> Modify
> <https://www.listbox.com/member/?&;>
> Your Subscription <http://www.listbox.com>
>



-------------------------------------------
openzfs-developer
Archives: https://www.listbox.com/member/archive/274414/=now
RSS Feed: https://www.listbox.com/member/archive/rss/274414/28015062-cce53afa
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=28015062&id_secret=28015062-f966d51c
Powered by Listbox: http://www.listbox.com

Reply via email to