On Oct 16, 2013, at 3:25 AM, Saso Kiselkov <[email protected]> wrote:
> On 10/16/13 6:04 AM, sanjeev bagewadi wrote:
>> Joining in late into this discussion....
>>
>> I would tend to agree with Richard. I think it is worth trying out AVL
>> for the entries in a bucket.
>> That should make the search O(log (n)) instead of O(n) where 'n' is the
>> chain length.
>
> Not quite. For AVL trees it's O(log(X)) where X is the number of buffers
> in the tree, but for hash tables it's O(Y) where Y is the chain length.
> As long as Y < log(X), then hash tables will be faster, and in my
> testing they are. What's worse, AVL trees are even more expensive in
> memory, since sizeof (avl_node_t) == 24, whereas hash tables only keep
> one arc_buf_hdr_t * per buffer (to point to the next one in the chain)
> plus one start arc_buf_hdr_t * in the hash table per chain.
Big O() notation falls down when the sizes involved are tiny. The whole of big
O() is to understand how algorithms *scale* as input sizes grow. Whereas the
work you've been doing is designed to keep the input size as small as possible.
AVL trees and other more sophisticated algorithms won't help with tiny chains.
The cost to search a short linked list is sufficiently short that it almost
always beats some kind of better O(log(n)) solution. The trick is to keep
those lists short.
(You see this with sorting too, btw. Its why quicksort and similar algorithms
usually devolve into an insertion sort when the set size is small. Because
even the O(n^2) performance of an insertion sort is faster if the n is small.
What is missing here is the constant factor that is included with "n", and that
constant factor can be very different. For small values of n, that constant
factor can *dominate*.
- Garrett
_______________________________________________
developer mailing list
[email protected]
http://lists.open-zfs.org/mailman/listinfo/developer