On Sun, Nov 17, 2013 at 5:00 PM, Steven Hartland <[email protected]>wrote:
> ----- Original Message ----- From: "Matthew Ahrens" <[email protected]> > > > > On Sun, Nov 17, 2013 at 1:42 PM, Steven Hartland <[email protected] >> >wrote: >> >> Finally found some time today to port the work I did >>> on FreeBSD to improve ZFS N-way mirror read performance >>> by using location information. >>> >>> Webrev: >>> http://cr.illumos.org/~webrev/steveh/illumos-4334/ >>> >> >> >> Thanks for taking on this work, Steven. Here are some comments on the >> code: >> >> vdev_impl.h: >> Choose a different name for vq_lastoffset, so it is not confused with the >> existing vq_last_offset. >> > > Will change to vq_last_queued_offset, is vq_last_offset new I don't seem > to recall it? > > > I believe this is referring to the last *queued* >> i/o, as opposed to the last *issued* i/o. Why can't this be tracked in >> (i.e. set by) the vdev_queue layer? >> > > Justin suggested just this but when I tested, it eliminated any gain from > the code. My theory on this is that the addition to queue itself happens > quite a bit later in the flow, possibly to late resulting in the next IO > being processed without the correct last offset information. > > Justin was going to have a dig on this one but I don't think he found the > time. > > > Please use an enum for vdev_rotation_state >> > > There's no such variable, are you refering to vdev_rotation_rate? yes > > If so this was initially a bool for non-rotating but it was deemed using > rotation_rate would be more helpful moving forward, as this value could > be used to make more intelligent decisions. Yes, changing it to a bool would probably be even better. Going forward, it will be easy to change this to a different type when necessary. > > > vdev_mirror.c: >> unfortunately I think we have to declare tunables as non-static, otherwise >> gcc can optimize them away. perhaps prefix the variable names with >> "zfs_mirror_"? >> > > These are tunables in FreeBSD and Linux does illumos have such things? > yes, they can be set by /etc/system or mdb. > > If cc optimises them away is that an issue? yes, because then they can't be tuned. > > > mc_load: I assume this is here for debugging only? could you add a >> comment >> to that effect?` >> > > If you mean move this from struct to local then yes I believe this is > something > that can now be done now we have the mm_preferred changes. > > > can you explain what mm_preferred points to? looks like it starts with >> "off the end of the array", which seems like a questionable decision. Oh, >> I see it is off the end of the array but you allocate a little more space >> for it. That's pretty trick / confusing. Is this measurably better >> performing than doing something straightforward like (a) allocating it in >> vdev_mirror_child_select(), or (b) walk mm_children again to find the nth >> child with mc_load==lowest_load? Another way to do this even more >> efficiently and (probably) straightforwardly would be to start the loop on >> a random child, and go through the loop twice. Then you don't need >> mm_preferred or mm_preferred_cnt at all, you can just go with the first >> (or >> last) child with the lowest load. >> > > This was something that Justin came up with to simplify the flow of an > earlier revision. > > > rotating_seek_offset: what's a "reduced rotating media seek increment"? >> Could you rewrite the comment to be something like: "If we issue an i/o >> which is more than rotating_seek_offset bytes from the previous i/o, and >> the vdev is VDEV_RATE_ROTATING, then the device's load metric will be >> increased by rotating_seek_offset. Otherwise it will be increased by >> non_rotating_seek_inc." >> > > That's not actually what it does. rotating_seek_offset is the maximum > offset > at which the reduced rotating media offset will be applied, which is > currently 1/2 of rotating_seek_inc, see line:170. > > > vdev_mirror_load: please add a comment above this function explaining what >> it is trying to do. i.e. calculate and return load on the vdev if a new >> io >> were to be added at zio_offset. calculation is based on X and Y, used for >> Z... Without the comment there is cognitive dissonance as this function >> does not load a vdev_mirror (annoying verbs that are also nouns!) >> > > Comment added. > > > 138: I don't understand this comment, or what code it is referring to; can >> you elaborate? I don't see any check for resilvering before returning >> UINT_MAX above. >> > > In a previous revision INT_MAX was returned if vdev_resliver_txg != 0 > as it was assumed that avoiding the device that was reslivering would > improve performance. Justin found this assumption didn't hold true, hence > the early return was removed and the comment added so others are aware of > this in case they thought of the same thing. So you'll remove the comment? > > 325: Oh, that's why we bother with the whole preferred_cnt stuff? I >> didn't >> realize that SSDs have a limited number of reads. Can you point me to >> some >> specs for that (how many reads before it wears out)? People on the >> internet seem to disagree: >> http://superuser.com/questions/440171/will-reading- >> data-cause-ssds-to-wear-out >> > > Its my understanding that the predominate factor in wear levelling is > indeed > writes, as they can result in NAND failure, however the read disturb effect > of flash memory could also come into play in extreme cases keeping this > extra > bit of randomisation: > http://en.wikipedia.org/wiki/Flash_memory#Read_disturb > > Regards > Steve > > ================================================ > This e.mail is private and confidential between Multiplay (UK) Ltd. and > the person or entity to whom it is addressed. In the event of misdirection, > the recipient is prohibited from using, copying, printing or otherwise > disseminating it or any information contained in it. > In the event of misdirection, illegible or incomplete transmission please > telephone +44 845 868 1337 > or return the E.mail to [email protected]. > >
_______________________________________________ developer mailing list [email protected] http://lists.open-zfs.org/mailman/listinfo/developer
