On Oct 16, 2013, at 1:11 PM, Richard Yao <[email protected]> wrote:

> On 10/16/2013 04:01 PM, Prakash Surya wrote:
>> On Wed, Oct 16, 2013 at 11:55:10AM -0700, [email protected] wrote:
>>> In a nutshell Richard -- you're looking at this from a computer science 
>>> theory perspective, without understanding the pragmatic issues.  The 
>>> pragmatic engineering solution is to keep the chains small, and optimize a 
>>> solution for small chains.  That's what Sašo has done.   This is the 
>>> difference between an engineer and a scientist.  In this particular case, 
>>> my KISS values definitely lead me more towards the engineering approach. 
>>> 
>>> Its true in the very worst case AVL trees will win, and have a much less 
>>> bad worst case.  But its also true that if Sašo's work at keeping the chain 
>>> sizes is small is effective, then the AVL trees introduce substantial 
>>> complexity and hurt performance in the typical case.  The worst case should 
>>> be sufficiently exceedingly rarer that we can simply discount it.  If it 
>>> isn't, then we need to go back and rethink our hash algorithm.
>> 
>> I agree. We *need* to keep the chains short. Period. Thus, asymptotic
>> analysis doesn't really apply here.
> 
> If we are going to be concerned about further tuning to keep the chains
> short, then I should say that I do not think the chains are short
> enough. I would expect the AVL trees to start winning above an average
> chain size of 4.

That's an easy test to check.  I encourage you to do so.

The other question is what is the *average* chain size we're talking about 
here.  If its smaller than or close to 4, then there is simply no value in the 
additional complexity.  I think that you'll probably find that the AVL tree 
check actually doesn't win until your closer to a chain length of 8, because of 
the extra operations you need to perform to divide, etc.  compiled with the 
rebalancing that AVL trees perform, and the extra code complexity with 
*modification* of the trees.  It turns out that walking really short lists is 
pretty darn cheap, and on average you'll only need to walk about the half the 
tree length before finding a match.


> 
> My point is that we could use AVL trees to handle collisions and not
> worry about chain length so much. I find letting AVL trees handle
> collisions preferable to:
> 
> 1. Increasing the hash table to the point where a system not even using
> ZFS (but having it loaded) uses more space to store an empty hash table
> than it uses to store the ZFS source code.

Huh?  modunload() is your friend in that case. :-)  I'm not going to waste time 
thinking about optimizing a kernel module's behavior for the case where it has 
no consumers… 

> 
> 2. The complexity of implementing extendible hash tables.

Oh, extendible hash tables would be a terrible idea.  I agree with you there.

> 
> Alternatively, we could just stop worrying about chain length and go
> with what Saso has. It is clearly better than what we have now.

YES!!!  Worry about the chain length when it is shown to be a problem or likely 
to be a problem.  What Sašo's current work does is make the chain lengths 
small.  So this is clearly a win. 

        - Garrett

_______________________________________________
developer mailing list
[email protected]
http://lists.open-zfs.org/mailman/listinfo/developer

Reply via email to