Geoff Howard wrote: > > > Yes, sorry looking back my comment wasn't clear. What I was > catching on to > was that each element of the array stored an integer which seems to be > storing two different values - one in its 2 most significant bytes for the > component, one in its two least significant bytes for the unique > key within > that component space - in your example a filename. > So, in the example above, the integer stored in p[1] is > 0xfb021431 and is an > XSLTTransformer key. The integer stored in p[2] is 0xfb02542e, also an > XSLTTransformer key, but with a different resource. If I understood you > right, > p[1]>>>16 = p[2]>>>16 = 0x0000fb02 > which is a unique hash meaning "XSLTTransformer" component. And > p[1] & 0xFFFF = 0x00001431 > which is a unique hash meaning "layout.xsl". >
The longer I think about the key generation, I feel that we cannot use hashed values at least as long as we use "long" for the hash value. Why? The answer is simple, a long has 2^64 possibilities. Often the information that is hashed is a file name or a URI. Now, if we only count lower case characters for the file name, we have 26 possibilities for each position, so there are 26^[length of information] possible values and this is very quick much more possibilities than the hashed value offers. So there must be collisions! And a cache based on possible collisions is not usable. To come to a conlusion I did a simple performance test: - Creating 10000 random keys (strings) where each character is a value between 0 and 9 - Hashing the 10000 values - Putting the strings into the cache (a HashMap) - Putting the Hashed values into another cache (a HashMap) - Searching for all values in the string cache - Searching for all values in the hashed cache (this requires of course rehashing the values) And this is the result: Start key generation Created 10000 strings with average length 169 Start hashing Hashed: 120 ms Creating cache with Strings Created: 60 ms Creating cache with longs Created: 30 ms Testing strings Strings: 10 ms Testing hashed strings Hashed strings: 131 ms So, looking up the hashed information is slower as the values have to be rehashed. But putting strings into the cache is slower. >From the test above it seems that using strings is still a little bit faster than using hashed values. Or did I oversee something? Could we extend this test in any way? Carsten --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]