Re: getHash inconsistency

Sean Kelly Thu, 22 Mar 2012 11:41:49 -0700

On Mar 20, 2012, at 10:10 PM, H. S. Teoh wrote:

> On Wed, Mar 21, 2012 at 03:55:49PM +1100, Daniel Murphy wrote:
>> "H. S. Teoh" <[email protected]> wrote in message 
>> news:[email protected]...
> [...]
>>> Then the question is, what should be the fix?
>>> 
>>> Currently, char[] and string have getHash defined in
>>> rt.typeinfo.ti_Ad.TypeInfo_Aa, which appears to be a custom hashing
>>> algorithm, whereas const(char)[] uses TypeInfo_Array.getHash, which
>>> uses rt.util.hash.hashOf. Which one should be used?
> [...]
>> 
>> I assume the custom hashing routine is better/faster?  If so, that
>> one.  I'd assume that getHash not working the same for const strings
>> is an oversight. 
> [...]
> 
> Here's the current hashing code for char[] and string:
> 
>        foreach (char c; s)
>            hash = hash * 11 + c;
> 
> For const(char)[], it's rt.util.hash.hashOf, which is Paul Hsieh's
> SuperFastHash algorithm with very good hash distribution properties. It
> does seem to involve a lot more operations that the simple loop above,
> though; so I assume the above simple loop was chosen because hashing
> strings are a very common operation and, for the most part, only need a
> simple hash function.
> 
> So I'm kinda leaning towards SuperFastHash, but worried about whether it
> will cause performance degradation, in which case we should stick with
> the simple loop.


The thing with hashes is that you usually gain more by getting a good 
distribution than you lose by computing the hash.  Our AA implementation should 
probably store the computed hash value per node if it isn't already though, so 
we can compare against that before doing opEquals when looking up a node, and 
so it's available when rehashing.

Re: getHash inconsistency

Reply via email to