On Oct 16, 2013, at 1:11 PM, Richard Yao <[email protected]> wrote:
> On 10/16/2013 04:01 PM, Prakash Surya wrote:
>> On Wed, Oct 16, 2013 at 11:55:10AM -0700, [email protected] wrote:
>>> In a nutshell Richard -- you're looking at this from a computer science
>>> theory perspective, without understanding the pragmatic issues. The
>>> pragmatic engineering solution is to keep the chains small, and optimize a
>>> solution for small chains. That's what Sašo has done. This is the
>>> difference between an engineer and a scientist. In this particular case,
>>> my KISS values definitely lead me more towards the engineering approach.
>>>
>>> Its true in the very worst case AVL trees will win, and have a much less
>>> bad worst case. But its also true that if Sašo's work at keeping the chain
>>> sizes is small is effective, then the AVL trees introduce substantial
>>> complexity and hurt performance in the typical case. The worst case should
>>> be sufficiently exceedingly rarer that we can simply discount it. If it
>>> isn't, then we need to go back and rethink our hash algorithm.
>>
>> I agree. We *need* to keep the chains short. Period. Thus, asymptotic
>> analysis doesn't really apply here.
>
> If we are going to be concerned about further tuning to keep the chains
> short, then I should say that I do not think the chains are short
> enough. I would expect the AVL trees to start winning above an average
> chain size of 4.
That's an easy test to check. I encourage you to do so.
The other question is what is the *average* chain size we're talking about
here. If its smaller than or close to 4, then there is simply no value in the
additional complexity. I think that you'll probably find that the AVL tree
check actually doesn't win until your closer to a chain length of 8, because of
the extra operations you need to perform to divide, etc. compiled with the
rebalancing that AVL trees perform, and the extra code complexity with
*modification* of the trees. It turns out that walking really short lists is
pretty darn cheap, and on average you'll only need to walk about the half the
tree length before finding a match.
>
> My point is that we could use AVL trees to handle collisions and not
> worry about chain length so much. I find letting AVL trees handle
> collisions preferable to:
>
> 1. Increasing the hash table to the point where a system not even using
> ZFS (but having it loaded) uses more space to store an empty hash table
> than it uses to store the ZFS source code.
Huh? modunload() is your friend in that case. :-) I'm not going to waste time
thinking about optimizing a kernel module's behavior for the case where it has
no consumers…
>
> 2. The complexity of implementing extendible hash tables.
Oh, extendible hash tables would be a terrible idea. I agree with you there.
>
> Alternatively, we could just stop worrying about chain length and go
> with what Saso has. It is clearly better than what we have now.
YES!!! Worry about the chain length when it is shown to be a problem or likely
to be a problem. What Sašo's current work does is make the chain lengths
small. So this is clearly a win.
- Garrett
_______________________________________________
developer mailing list
[email protected]
http://lists.open-zfs.org/mailman/listinfo/developer