On Nov 17, 2013, at 2:36 PM, Matthew Ahrens <[email protected]> wrote:
> On Sun, Nov 17, 2013 at 1:42 PM, Steven Hartland <[email protected]> > wrote: > Finally found some time today to port the work I did > on FreeBSD to improve ZFS N-way mirror read performance > by using location information. > > Webrev: > http://cr.illumos.org/~webrev/steveh/illumos-4334/ > > Thanks for taking on this work, Steven. Here are some comments on the code: > > vdev_impl.h: > Choose a different name for vq_lastoffset, so it is not confused with the > existing vq_last_offset. I believe this is referring to the last *queued* > i/o, as opposed to the last *issued* i/o. Why can't this be tracked in (i.e. > set by) the vdev_queue layer? This came up on the FreeBSD zfs-devel list, but was never adequately answered. Steven saw a performance difference when the offset was set at the vdev_queue layer and I never found time to test this and track down why myself. I have not doubt that Steven saw what he saw, I just don’t like unanswered questions. :-) > can you explain what mm_preferred points to? looks like it starts with "off > the end of the array", which seems like a questionable decision. Oh, I see > it is off the end of the array but you allocate a little more space for it. > That's pretty trick / confusing. Is this measurably better performing than > doing something straightforward like (a) allocating it in > vdev_mirror_child_select(), or (b) walk mm_children again to find the nth > child with mc_load==lowest_load? Another way to do this even more > efficiently and (probably) straightforwardly would be to start the loop on a > random child, and go through the loop twice. Then you don't need > mm_preferred or mm_preferred_cnt at all, you can just go with the first (or > last) child with the lowest load. mm_preferred was my doing in an attempt to simplify, via memoization, some of the logic in an early version of the patch that Steven proposed on the FreeBSD zfs-devel list. I believe that mc_load was obviated by that change. At the time, I didn’t review why this extra randomization step was being taken. In looking at it now, I don’t see why this is necessary at all. At light load (i.e. only one command outstanding at a time), we’ll favor the first healthy device. But at any other time, the load code should do it’s job and create an even distribution. It’s hard for me to believe that this would cause premature wear out of one due to reads. If it did, that might be considered a feature since you don’t want your SSDs to fail at exactly the same time! Hopefully this all means that mm_preferred* and mc_load can just go with no change in the result. — Justin
_______________________________________________ developer mailing list [email protected] http://lists.open-zfs.org/mailman/listinfo/developer
