On Wed, Oct 16, 2013 at 02:57:08AM +0100, Saso Kiselkov wrote:
> On 10/16/13 2:42 AM, Prakash Surya wrote:
> > OK, that is where I assumed the speed up was coming from (shorter
> > chains leading to faster lookups).
> > 
> > I also assumed there would be a "bucket lock" that needs to be acquired
> > during this lookup similar to the dbuf hash (which would affect
> > concurrency), but I guess that isn't the case here (I haven't studied
> > the arc hash code as well as I have the dbuf hash code).
> 
> It's mostly the same. Dbuf would also benefit a bit from restructuring,
> perhaps.
> 
> > So, if this simply comes down to a hash collision issue, can't we try
> > and take this a bit further.. Can we make the hash size be completely
> > dynamic? Instead of using another heuristic, can we grow (and shrink?)
> > the hash as the workload demands? So if the max chain depth reaches a
> > threshold, we increase the number of hash buckets (e.g. double it).
> > 
> > Of course the details of performing the grow (i.e. rehash) operation
> > would need to be worked out so it doesn't affect performance,
> > consistency, etc.. But from a theoretical stand point, moving it to be
> > sized dynamically seems like a much better solution, IMO.
> 
> Several problems with this approach:
> 
> 1) unpredictability - when do trigger it and by what criteria?

This would be need to be ironed out, but I was thinking that if the max
chain length grows above X, then grow the hash. What the value of X is
could be a tunable with some default value that is determined through
testing certain workloads.

> 2) the extreme expense of rehashing everything (easily dozens of
>    seconds of one CPU pegged at 100% while everything else grinds
>    to a halt as the ARC is inaccessible)

Without doing anything clever, yes. You would have to lock the entire
hash table, allocate the new table, and then move everything over.

I was hoping there might be a clever way to do this in the background,
so as not to affect performance "too much".

Thinking out loud here, but maybe something incremental. For example, a
background thread rehashes things one bucket at a time, and uses a index
to indicate if a bucket is valid or not. So..

    read -> hash to index i of old/current table
    if (bg_thread_is_rehashing && i < rehash_index)
        rehash buffer to get index j of new table
        look for buffer in new table using index j
    else
        look for buffer in old/current table using index i

And the background thread will need to lock each bucket, rehash the
buffers into the new table, increment the "rehash_index", unlock the bucket
and then repeat until all buckets are rehashed.

Granted, I haven't given this much thought, but if it's possible to do
in an unobtrusive way, I think it's worth pursuing.

> 3) hard to diagnose latency spikes due to problem #2

It would have to be done so as not to introduce these latency spikes.

> 
> The easiest approach is just to give the admin a knob which they can
> twiddle at boot and then leave alone. If performance is really a
> problem, they can schedule downtime. Doing it at runtime has its
> problems and doing it automatically is dangerous.

I agree that easiest approach is to just statically size the hash table,
but that doesn't really solve the problem. Rather we're just kicking the
can down the road, IMO.

If the completely dynamic approach isn't tractable, why split the table
into a 2D array? Why not just increase the size of it, and keep it a 1D
array?

-- 
Cheers, Prakash

> 
> Cheers,
> -- 
> Saso
> _______________________________________________
> developer mailing list
> [email protected]
> http://lists.open-zfs.org/mailman/listinfo/developer
_______________________________________________
developer mailing list
[email protected]
http://lists.open-zfs.org/mailman/listinfo/developer

Reply via email to