reserve.pass.cpp

Howard Hinnant Fri, 06 Jul 2012 20:00:32 -0700

On Jul 6, 2012, at 9:58 PM, Joerg Sonnenberger wrote:

> On Fri, Jul 06, 2012 at 05:31:14PM -0000, Howard Hinnant wrote:
>> Author: hhinnant
>> Date: Fri Jul  6 12:31:14 2012
>> New Revision: 159836
>> 
>> URL: http://llvm.org/viewvc/llvm-project?rev=159836&view=rev
>> Log:
>> This commit establishes a new bucket_count policy in the unordered
>> containers:  The policy now allows a power-of-2 number of buckets to
>> be requested (and that request honored) by the client.
>> And if the number of buckets is set to a power of 2, then the constraint
>> of the hash to the number of buckets uses & instead of %.
> 
> Does the standard place any constraints in this area?


No.

> So with this
> change, power-of-two sized hash tables still have a conditional in the
> hot path.

Yes.  My observation is that the cost of this conditional is about 4 machine 
cycles on Intel.  In real world use the branch takes the same path over and 
over, so the hardware branch predictor becomes a very effective optimization.  
I.e. it is rare for a hash table to switch from prime to power-of-2 number of 
buckets or vice-versa, compared to the number of operations that constrain the 
hash to the number of buckets.

> The only justification for using non-power-of-two hashes seems
> to be extremely poor user-provided hash functions, which sounds like a
> lame excuse.

I wouldn't go that far at this time.  There are at least two dimensions to the 
quality of a hash function:

1. The extent to which it spreads entropy over all of the bits.
2. Speed.

Prior to this commit I observed test cases that took advantage of both of these 
dimensions.  And maximizing both of them at once is not a solved problem with 
current technology.

> The complexity of computing primes basically just goes back
> to the argument that linear series with increments smaller than the
> prime itself are evenly distributed.

The need to compute a prime is amortized over the life of an unordered 
container much like the need to allocate a new capacity is amortized over the 
life of a vector.  Not only is it not a driving factor, but significant 
optimization has gone into this effort such that the ratio of 
cost-to-comupte-prime/number-of-buckets actually dramatically drops as the 
number of buckets grows.  I.e. Yes, you might spend more to compute a prime to 
hold enough buckets for a million buckets than for a hundred.  But the cost per 
bucket will be far less for the million bucket solution than for the hundred 
bucket solution.  This is a solved problem, and not a research topic (unless 
the client intervenes and demands a linear bucket growth instead of a geometric 
one - disastrous for both vectors and unordered containers).

> The same or similar properties can
> be obtained using double-width-multiplication-and-mask universal hash
> functions (hash(x) = hi(dw(x) * T), where dw casts x to a type twice the
> size and hi gives the upper half). They are no more expensive to compute
> than the double multiplication version of the modulo and significant
> cheaper than using modulo directly on most (all?) platforms. So: why not
> kill the complexity and require power-of-two bucket counts and clients
> to use a sane hash function?

My current experience is that a truly general purpose hash container can not 
assume much about the collection of keys, nor the function with which those 
keys are hashed.  This is especially true when I don't even know the type of 
the keys, nor the algorithm behind the hash function.  And yet casual users 
will demand decent (non-quadradic) performance.

Admittedly, my clients continually evolve, becoming more educated and more 
clever with every year.  There may come a day when I can be assured that my 
clients will just assume a power-of-2 hash container and will have the tools to 
deal with that.  But that is not what I see today.

What I see today is that it is common for my clients to produce a combination 
of hash function and container values that will result in significant 
collisions if the number of buckets is not flexible, even after there is 
explicit evidence that they have attempted to avoid such a situation.  And then 
I get performance bug reports against the libc++ unordered containers.  This is 
not to say that my clients don't know what they're doing.  It is to say that 
there is a wide variety of unordered container implementations available today, 
and it takes quite sophisticated analysis to guarantee consistent performance 
over all of them.

The current libc++ philosophy is to be forgiving in this department, but to 
also expose the sharper tools if you want to try to use them.  That is, I'm 
trying to increase the viable options for the client, not decrease them.  The 
very strongest negative performance vectors are associated with the client 
attempting to set the number of buckets to one value and the libc++ 
implementation insisting on another value.  And that is what is being improved 
upon here.

I.e. hash containers need a manual transmission.  An automatic is fine for 
casual users.  But we've got to have a way to override the automatic.  4-speed 
is barely adequate.  5 or 6 speed is much better.  8 or 10 speed would be 
awesome!

Howard

_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Re: [cfe-commits] [libcxx] r159836 - in /libcxx/trunk: include/__hash_table test/containers/unord/unord.map/reserve.pass.cpp test/containers/unord/unord.multimap/reserve.pass.cpp test/containers/unord/unord.multiset/reserve.pass.cpp test/containers/unord/unord.set/reserve.pass.cpp

Reply via email to