Just to be on the record on the list, I'd like to add one more thing I
wrote to Richard which didn't make it to the mailing list (Richard: I
updated some numbers after running some more calculations):
For instance, let's say you're trying to represent 1TB worth of 4k ARC
buffers, i.e. 268435456 buffers (doesn't matter if they sit in ARC or
L2ARC, the headers are still there).
Some important constants used below:
sizeof (void *) = 8
sizeof (avl_node_t) = 24
sizeof (avl_tree_t) = 40
Using an AVL tree you'll need to spend (best case):
2^40 / 4096 * sizeof (avl_node_t) = 6 GB
Using a linked list, it's going to be:
2^40 / 4096 * sizeof (void *) = 2 GB
Now, how large is the hash table going to be? The minimum amount of
physical DRAM to hold that many buffers is 96GB (268435456 buffers *
sizeof ((arc_buf_hdr_t) + sizeof (l2arc_buf_hdr_t)) comes in at ~74GB)
If you dedicate 0.1% to your hash table, that comes to:
Number of linked list "head" pointers that fit in:
96 * 2^20 / sizeof (void *) = 12582912
Number of AVL trees that fit in:
96 * 2^20 / sizeof (avl_tree_t) = 2516582
Thus the average linked list chain length will be 268435456 / 12582912 =
21.33, whereas the average tree size would be 268435456 / 2516582 =
106.66. So far, seemingly we'd be justified in saying that AVL trees
will be better, because O(n) for n=21.33 is greater than O(log(n)) for
n=106.66 ('echo "l(106.66)/l(2)" | bc -l' gives 6.73), so even
considering a fixed callback overhead on the AVL tree comparator, it
should still be somewhat faster.
BUT BUT BUT (!) - and here's the real kicker - we expended three times
as much memory in the first step to get this performance (6GB for trees
vs. 2 GB for linked lists). What if we instead chose to reinvest a part
of that overhead into the hash table and get shorter lists? What if,
instead of a 96MB (0.1% of physmem) we chose to go with a 384MB (0.4% of
physmem) hash table? That gives us 4x as many hash buckets and thus a
linked list chain length of on average 1/4 the original, i.e. 5.33. Now
that's shorter than the AVL tree case and at a fraction of the total
memory requirements!
The only difference really is that the fixed memory cost of AVL trees
when you're not using them can be made very small (e.g. 96MB in the
above example, vs. 384MB for hash tables). But if you *do* use it a lot
(as one would expect on a high-volume machine), the variable memory cost
of AVL tree nodes quickly overtakes the total cost of linked lists and
it doesn't even give you any extra performance.
Cheers,
--
Saso
_______________________________________________
developer mailing list
[email protected]
http://lists.open-zfs.org/mailman/listinfo/developer