RE: [RT]: Calculating the cache key

Geoff Howard Fri, 24 May 2002 10:44:32 -0700

Carsten -

This test is great -- let me make sure I understand your results though:


> 
> The longer I think about the key generation, I feel that we cannot use
> hashed values at least as long as we use "long" for the hash value.
> 
> Why? The answer is simple, a long has 2^64 possibilities. Often the
> information that is hashed is a file name or a URI. Now, if we
> only count lower case characters for the file name, we have 26
> possibilities for each position, so there are 26^[length of 
> information]
> possible values and this is very quick much more possibilities
> than the hashed value offers. So there must be collisions!
> And a cache based on possible collisions is not usable.
> 
> To come to a conlusion I did a simple performance test:
> - Creating 10000 random keys (strings) where each character is a value
> between 0 and 9
> - Hashing the 10000 values
> - Putting the strings into the cache (a HashMap)
> - Putting the Hashed values into another cache (a HashMap)
> - Searching for all values in the string cache
I guess you mean for each value (in a loop)?

> - Searching for all values in the hashed cache (this requires 
> of course
> rehashing the values)
> 
> And this is the result:
> Start key generation
> Created 10000 strings with average length 169
OK, 
> Start hashing
> Hashed: 120 ms
120 for all 10,000 -> The hashing function takes .012 ms per call on average

> Creating cache with Strings
> Created: 60 ms
60 ms for all 10,000 -> .006 ms on average
> Creating cache with longs
> Created: 30 ms
same, .003 ms for longs (faster)
> Testing strings
> Strings: 10 ms
Object = mapWithStrings.get(KeyString) -> 10 ms for all 10000? -> .001 ms
per lookup average? (wow)
> Testing hashed strings
> Hashed strings: 131 ms
Object = mapWithHashes.get(hashCreate(KeyString)) -> 131 ms for all -> .0131
ms per lookup average.

> 
> So, looking up the hashed information is slower as the values 
> have to be
> rehashed. But
> putting strings into the cache is slower.
But putting strings in only takes twice the time -- taking them out is ten
times faster and happens more often (right?)

> From the test above it seems that using strings is still a 
> little bit faster
> than
> using hashed values.
> Or did I oversee something?
I had been assuming that if the keys were not objects (strings) but integers
that there could be a better datastructure than HashMap.  I haven't had much
time to think about it today but will tonight.  Any ideas?

Geoff

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]

RE: [RT]: Calculating the cache key

Reply via email to