On 10/16/13 4:30 AM, Richard Yao wrote:
> 
> 
> On Oct 15, 2013, at 9:57 PM, Saso Kiselkov <[email protected]> wrote:
> 
>> On 10/16/13 2:42 AM, Prakash Surya wrote:
>>>
>>> So, if this simply comes down to a hash collision issue, can't we try
>>> and take this a bit further.. Can we make the hash size be completely
>>> dynamic? Instead of using another heuristic, can we grow (and shrink?)
>>> the hash as the workload demands? So if the max chain depth reaches a
>>> threshold, we increase the number of hash buckets (e.g. double it).
>>>
>>> Of course the details of performing the grow (i.e. rehash) operation
>>> would need to be worked out so it doesn't affect performance,
>>> consistency, etc.. But from a theoretical stand point, moving it to be
>>> sized dynamically seems like a much better solution, IMO.
>>
>> Several problems with this approach:
>>
>> 1) unpredictability - when do trigger it and by what criteria?
>> 2) the extreme expense of rehashing everything (easily dozens of
>>   seconds of one CPU pegged at 100% while everything else grinds
>>   to a halt as the ARC is inaccessible)
>> 3) hard to diagnose latency spikes due to problem #2
>>
>> The easiest approach is just to give the admin a knob which they can
>> twiddle at boot and then leave alone. If performance is really a
>> problem, they can schedule downtime. Doing it at runtime has its
>> problems and doing it automatically is dangerous.
> 
> Another possibility is to switch from linked lists to AVL trees. That would 
> make the cost of collisions less expensive.

Tried it, it's slower. For instance, on a 50GB SSD with LZ4 compression
I can house 6400000 buffers with a 50/50 split by volume of 128k and 8k
buffers. So here's how a single AVL tree would perform:

$ echo 'l(6.4 * 10^6) / l(2)' | bc -l
22.60964047443681173958

Assuming we want one AVL tree per hash lock (by default 256), the
performance would be:

$ echo 'l(6.4 * 10^6 / 256) / l(2)' | bc -l
14.60964047443681173949

A machine with 8GB of total physical memory and 1MB of hash tables for
every 1GB of RAM (i.e. 8MB hash tables) would have 2^20 hash buckets for
an average hash chain length of:

$ echo 'scale=2 ; (6.4 * 10^6) / 2^20' | bc
6.10

In my testing this roughly corresponds to actual performance numbers:

                                Rebuild time
Old hash table (128k/1GB)       ~50s
256x AVL trees                  ~20s
New hash table (1MB/1GB)        ~10s

-- 
Saso
_______________________________________________
developer mailing list
[email protected]
http://lists.open-zfs.org/mailman/listinfo/developer

Reply via email to