Re: [OpenZFS Developer] [zfs] Review: 4334 Improve ZFS N-way mirrorread performance

Matthew Ahrens Sun, 17 Nov 2013 17:28:14 -0800

On Sun, Nov 17, 2013 at 5:00 PM, Steven Hartland <[email protected]>wrote:


> ----- Original Message ----- From: "Matthew Ahrens" <[email protected]>
>
>
>
>  On Sun, Nov 17, 2013 at 1:42 PM, Steven Hartland <[email protected]
>> >wrote:
>>
>>  Finally found some time today to port the work I did
>>> on FreeBSD to improve ZFS N-way mirror read performance
>>> by using location information.
>>>
>>> Webrev:
>>> http://cr.illumos.org/~webrev/steveh/illumos-4334/
>>>
>>
>>
>> Thanks for taking on this work, Steven.  Here are some comments on the
>> code:
>>
>> vdev_impl.h:
>> Choose a different name for vq_lastoffset, so it is not confused with the
>> existing vq_last_offset.
>>
>
> Will change to vq_last_queued_offset, is vq_last_offset new I don't seem
> to recall it?
>
>
>  I believe this is referring to the last *queued*
>> i/o, as opposed to the last *issued* i/o.  Why can't this be tracked in
>> (i.e. set by) the vdev_queue layer?
>>
>
> Justin suggested just this but when I tested, it eliminated any gain from
> the code. My theory on this is that the addition to queue itself happens
> quite a bit later in the flow, possibly to late resulting in the next IO
> being processed without the correct last offset information.
>
> Justin was going to have a dig on this one but I don't think he found the
> time.
>
>
>  Please use an enum for vdev_rotation_state
>>
>
> There's no such variable, are you refering to vdev_rotation_rate?


yes


>




> If so this was initially a bool for non-rotating but it was deemed using
> rotation_rate would be more helpful moving forward, as this value could
> be used to make more intelligent decisions.


Yes, changing it to a bool would probably be even better.  Going forward,
it will be easy to change this to a different type when necessary.


>
>
>  vdev_mirror.c:
>> unfortunately I think we have to declare tunables as non-static, otherwise
>> gcc can optimize them away.  perhaps prefix the variable names with
>> "zfs_mirror_"?
>>
>
> These are tunables in FreeBSD and Linux does illumos have such things?
>

yes, they can be set by /etc/system or mdb.


>
> If cc optimises them away is that an issue?


yes, because then they can't be tuned.


>
>
>  mc_load: I assume this is here for debugging only?  could you add a
>> comment
>> to that effect?`
>>
>
> If you mean move this from struct to local then yes I believe this is
> something
> that can now be done now we have the mm_preferred changes.
>
>
>  can you explain what mm_preferred points to?  looks like it starts with
>> "off the end of the array", which seems like a questionable decision.  Oh,
>> I see it is off the end of the array but you allocate a little more space
>> for it.  That's pretty trick / confusing.  Is this measurably better
>> performing than doing something straightforward like (a) allocating it in
>> vdev_mirror_child_select(), or (b) walk mm_children again to find the nth
>> child with mc_load==lowest_load?  Another way to do this even more
>> efficiently and (probably) straightforwardly would be to start the loop on
>> a random child, and go through the loop twice.  Then you don't need
>> mm_preferred or mm_preferred_cnt at all, you can just go with the first
>> (or
>> last) child with the lowest load.
>>
>
> This was something that Justin came up with to simplify the flow of an
> earlier revision.
>
>
>  rotating_seek_offset: what's a "reduced rotating media seek increment"?
>> Could you rewrite the comment to be something like: "If we issue an i/o
>> which is more than rotating_seek_offset bytes from the previous i/o, and
>> the vdev is VDEV_RATE_ROTATING, then the device's load metric will be
>> increased by rotating_seek_offset.  Otherwise it will be increased by
>> non_rotating_seek_inc."
>>
>
> That's not actually what it does. rotating_seek_offset is the maximum
> offset
> at which the reduced rotating media offset will be applied, which is
> currently 1/2 of rotating_seek_inc, see line:170.
>
>
>  vdev_mirror_load: please add a comment above this function explaining what
>> it is trying to do.  i.e. calculate and return load on the vdev if a new
>> io
>> were to be added at zio_offset.  calculation is based on X and Y, used for
>> Z...  Without the comment there is cognitive dissonance as this function
>> does not load a vdev_mirror (annoying verbs that are also nouns!)
>>
>
> Comment added.
>
>
>  138: I don't understand this comment, or what code it is referring to; can
>> you elaborate?  I don't see any check for resilvering before returning
>> UINT_MAX above.
>>
>
> In a previous revision INT_MAX was returned if vdev_resliver_txg != 0
> as it was assumed that avoiding the device that was reslivering would
> improve performance. Justin found this assumption didn't hold true, hence
> the early return was removed and the comment added so others are aware of
> this in case they thought of the same thing.


So you'll remove the comment?


>
>  325: Oh, that's why we bother with the whole preferred_cnt stuff?  I
>> didn't
>> realize that SSDs have a limited number of reads.  Can you point me to
>> some
>> specs for that (how many reads before it wears out)?  People on the
>> internet seem to disagree:
>> http://superuser.com/questions/440171/will-reading-
>> data-cause-ssds-to-wear-out
>>
>
> Its my understanding that the predominate factor in wear levelling is
> indeed
> writes, as they can result in NAND failure, however the read disturb effect
> of flash memory could also come into play in extreme cases keeping this
> extra
> bit of randomisation:
> http://en.wikipedia.org/wiki/Flash_memory#Read_disturb


>
>    Regards
>    Steve
>
> ================================================
> This e.mail is private and confidential between Multiplay (UK) Ltd. and
> the person or entity to whom it is addressed. In the event of misdirection,
> the recipient is prohibited from using, copying, printing or otherwise
> disseminating it or any information contained in it.
> In the event of misdirection, illegible or incomplete transmission please
> telephone +44 845 868 1337
> or return the E.mail to [email protected].
>
>

_______________________________________________
developer mailing list
[email protected]
http://lists.open-zfs.org/mailman/listinfo/developer

Re: [OpenZFS Developer] [zfs] Review: 4334 Improve ZFS N-way mirrorread performance

Reply via email to