----- Original Message ----- From: "Matthew Ahrens" <[email protected]>

On Sun, Nov 17, 2013 at 1:42 PM, Steven Hartland <[email protected]>wrote:

Finally found some time today to port the work I did
on FreeBSD to improve ZFS N-way mirror read performance
by using location information.

Webrev:
http://cr.illumos.org/~webrev/steveh/illumos-4334/


Thanks for taking on this work, Steven.  Here are some comments on the code:

vdev_impl.h:
Choose a different name for vq_lastoffset, so it is not confused with the
existing vq_last_offset.

Will change to vq_last_queued_offset, is vq_last_offset new I don't seem
to recall it?

I believe this is referring to the last *queued*
i/o, as opposed to the last *issued* i/o.  Why can't this be tracked in
(i.e. set by) the vdev_queue layer?

Justin suggested just this but when I tested, it eliminated any gain from
the code. My theory on this is that the addition to queue itself happens quite a bit later in the flow, possibly to late resulting in the next IO
being processed without the correct last offset information.

Justin was going to have a dig on this one but I don't think he found the
time.

Please use an enum for vdev_rotation_state

There's no such variable, are you refering to vdev_rotation_rate?

If so this was initially a bool for non-rotating but it was deemed using
rotation_rate would be more helpful moving forward, as this value could
be used to make more intelligent decisions.

vdev_mirror.c:
unfortunately I think we have to declare tunables as non-static, otherwise
gcc can optimize them away.  perhaps prefix the variable names with
"zfs_mirror_"?

These are tunables in FreeBSD and Linux does illumos have such things?

If cc optimises them away is that an issue?

mc_load: I assume this is here for debugging only?  could you add a comment
to that effect?`

If you mean move this from struct to local then yes I believe this is something
that can now be done now we have the mm_preferred changes.

can you explain what mm_preferred points to?  looks like it starts with
"off the end of the array", which seems like a questionable decision.  Oh,
I see it is off the end of the array but you allocate a little more space
for it.  That's pretty trick / confusing.  Is this measurably better
performing than doing something straightforward like (a) allocating it in
vdev_mirror_child_select(), or (b) walk mm_children again to find the nth
child with mc_load==lowest_load?  Another way to do this even more
efficiently and (probably) straightforwardly would be to start the loop on
a random child, and go through the loop twice.  Then you don't need
mm_preferred or mm_preferred_cnt at all, you can just go with the first (or
last) child with the lowest load.

This was something that Justin came up with to simplify the flow of an
earlier revision.

rotating_seek_offset: what's a "reduced rotating media seek increment"?
Could you rewrite the comment to be something like: "If we issue an i/o
which is more than rotating_seek_offset bytes from the previous i/o, and
the vdev is VDEV_RATE_ROTATING, then the device's load metric will be
increased by rotating_seek_offset.  Otherwise it will be increased by
non_rotating_seek_inc."

That's not actually what it does. rotating_seek_offset is the maximum offset
at which the reduced rotating media offset will be applied, which is
currently 1/2 of rotating_seek_inc, see line:170.

vdev_mirror_load: please add a comment above this function explaining what
it is trying to do.  i.e. calculate and return load on the vdev if a new io
were to be added at zio_offset.  calculation is based on X and Y, used for
Z...  Without the comment there is cognitive dissonance as this function
does not load a vdev_mirror (annoying verbs that are also nouns!)

Comment added.

138: I don't understand this comment, or what code it is referring to; can
you elaborate?  I don't see any check for resilvering before returning
UINT_MAX above.

In a previous revision INT_MAX was returned if vdev_resliver_txg != 0
as it was assumed that avoiding the device that was reslivering would
improve performance. Justin found this assumption didn't hold true, hence
the early return was removed and the comment added so others are aware of
this in case they thought of the same thing.

325: Oh, that's why we bother with the whole preferred_cnt stuff?  I didn't
realize that SSDs have a limited number of reads.  Can you point me to some
specs for that (how many reads before it wears out)?  People on the
internet seem to disagree:
http://superuser.com/questions/440171/will-reading-data-cause-ssds-to-wear-out

Its my understanding that the predominate factor in wear levelling is indeed
writes, as they can result in NAND failure, however the read disturb effect
of flash memory could also come into play in extreme cases keeping this extra
bit of randomisation:
http://en.wikipedia.org/wiki/Flash_memory#Read_disturb

   Regards
   Steve

================================================
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it.
In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to [email protected].

_______________________________________________
developer mailing list
[email protected]
http://lists.open-zfs.org/mailman/listinfo/developer

Reply via email to