We had an intern work on "sorted scrub" last year. Essentially the idea was to read the metadata to gather into memory all the BP's that need to be scrubbed, sort them by DVA (i.e. offset on disk) and then issue the scrub i/os in that sorted order. However, memory can't hold all of the BP's, so we do multiple passes over the metadata, each pass gathering the next chunk of BP's. This code is implemented and seems to work but probably needs some more testing and code cleanup.
One of the downsides of that approach is having to do multiple passes over the metadata if it doesn't all fit in memory (which it typically does not). In some circumstances, this is worth it, but in others not so much. To improve on that, we would like to do just one pass over the metadata to find all the block pointers. Rather than storing the BP's sorted in memory, we would store them on disk, but only roughly sorted. There are several ways we could do the sorting, which is one of the issues that makes this problem interesting. We could divide each top-level vdev into chunks (like metaslabs, but probably a different number of them) and for each chunk have an on-disk list of BP's in that chunk that need to be scrubbed/resilvered. When we find a BP, we would append it to the appropriate list. Once we have traversed all the metadata to find all the BP's, we would load one chunk's list of BP's into memory, sort it, and then issue the resilver i/os in sorted order. As an alternative, it might be better to accumulate as many BP's as fit in memory, sort them, and then write that sorted list to disk. Then remove those BP's from memory and start filling memory again, write that list, etc. Then read all the sorted lists in parallel to do a merge sort. This has the advantage that we do not need to append to lots of lists as we are traversing the metadata. Instead we have to read from lots of lists as we do the scrubs, but this should be more efficient We also don't have to determine beforehand how many chunks to divide each vdev into. If you'd like to continue working on sorted scrub along these lines, let me know. --matt On Sat, Jul 9, 2016 at 7:10 AM, Gvozden Neskovic <nesko...@gmail.com> wrote: > Dear OpenZFS developers, > > Since SIMD RAID-Z code was merged to ZoL [1], I started to look into the > rest of the scrub/resilvering code path. > I've found some existing specs and ideas about how to make the process > more rotational drive friendly [2][3][4][5]. > What I've gathered from these is that scrub should be split to metadata > and data traversal phases. As I'm new to ZFS, > I've made a quick prototype simulating large elevator using AVL list to > sort blocks by DVA offset [6]. It's probably > broken in more than few ways, but this is just a quick hack to get a grasp > of the code. Solution turned out similar to > 'ASYNC_DESTROY' feature, so I'm wondering if this might be a direction to > take? > > At this stage, I would appreciate any input on how to proceed with this > project. If you're a core dev and would like > to provide any kind of mentorship or willing to answer some questions from > time to time, please let me know. > Or, if there's a perfect solution for this just waiting to be implemented, > even better. > For starters, pointers like: read this article, make sure you understand > this peace of code, etc., would also be very helpful. > > Regards, > > [1] > https://github.com/zfsonlinux/zfs/commit/ab9f4b0b824ab4cc64a4fa382c037f4154de12d6 > [2] https://blogs.oracle.com/roch/entry/sequential_resilvering > [3] > http://wiki.old.lustre.org/images/f/ff/Rebuild_performance-2009-06-15.pdf > [4] https://blogs.oracle.com/ahrens/entry/new_scrub_code > [5] http://open-zfs.org/wiki/Projects#Periodic_Data_Validation > [6] > https://github.com/ironMann/zfs/commit/9a2ec765d2afc38ec76393dd694216fae0221443 > *openzfs-developer* | Archives > <https://www.listbox.com/member/archive/274414/=now> > <https://www.listbox.com/member/archive/rss/274414/28015287-49e52ff8> | > Modify > <https://www.listbox.com/member/?&> > Your Subscription <http://www.listbox.com> > ------------------------------------------- openzfs-developer Archives: https://www.listbox.com/member/archive/274414/=now RSS Feed: https://www.listbox.com/member/archive/rss/274414/28015062-cce53afa Modify Your Subscription: https://www.listbox.com/member/?member_id=28015062&id_secret=28015062-f966d51c Powered by Listbox: http://www.listbox.com