tommyettinger commented on issue #1409:
URL: 
https://github.com/apache/incubator-fury/issues/1409#issuecomment-2028576591

   OK, I've been thinking about this for a while, and tinkering a lot with 
different options. I was surprised to find that at least one version of cuckoo 
hashing really can perform quite well (though not as dramatically better on 
containsKey() performance), as long as it never encounters two different keys 
with the same exact hashCode(). I modified an existing Apache v2-licensed 
cuckoo hash map, reducing allocations as much as possible, and the performance 
gains are considerable on some operations:
   
   ```
   POPULATE
   IdentityCuckooMap  : 198676.832 ns/op
   JDK IdentityHashMap: 508930.787 ns/op
   CONTAINS
   IdentityCuckooMap  :   1023.728 ns/op
   JDK IdentityHashMap:   1418.562 ns/op
   COPY
   IdentityCuckooMap  :  28171.380 ns/op
   JDK IdentityHashMap: 302847.095 ns/op
   ITERATE
   IdentityCuckooMap  :  27954.503 ns/op
   JDK IdentityHashMap: 149939.553 ns/op
   ```
   
   The catch here is that if a cuckoo hash "fails," it can enter an infinite 
loop or allocate memory forever, etc. Very bad worst-case properties. However, 
the internals of this particular cuckoo-hashed map and the existing 
FuryObjectMap are quite similar, and I'm working on an approach that will 
"flip" from using cuckoo hashing as long as it is viable, to using linear 
probing as FuryObjectMap currently does, if fully-colliding keys are present 
(or some other situation would force a rehash). I'm not sure what kind of speed 
penalty the flipping will impose; it is possible branch prediction will figure 
out that fully-colliding keys are extremely rare, and that would mean the cost 
would be very low relative to the above numbers. But, the code is larger for 
each method, which might be a problem for inlining. I'll have to benchmark and 
see... I know the JDK HashMap can also change its implementation to use a 
TreeMap-like Red-Black Tree if it encounters excessive collisions on Comparable 
k
 eys, so this isn't an unprecedented model.
   
   I'm testing the FlipMap code in [my jdkgdxds 
repo](https://github.com/tommyettinger/jdkgdxds/blob/master/src/test/java/com/github/tommyettinger/ds/test/FlipMap.java)
 for correctness, and I'll be benchmarking it using what should be similar code 
in [my assorted-benchmarks 
repo](https://github.com/tommyettinger/assorted-benchmarks/blob/master/jmh/src/main/java/de/heidelberg/pvs/container_bench/FlipMap.java).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to