tommyettinger commented on issue #1274: URL: https://github.com/apache/incubator-fury/issues/1274#issuecomment-1997259888
Hi @chaokunyang , very nice work on Fury! There's a handful of things I can think of that might speed up maps in your case. An open-addressing, linear-probing table like this needs a low enough collision rate to be as fast as it can get. If your hashCode() results are known to be fairly-high-quality random numbers, such as for identity hash codes, there's nothing that the map needs to do to mix the hashCode() at all, and `place()` can use `item.hashCode() & mask` instead of needing to do the `long` multiplication and shift. That's only a tiny speedup, but place() may need to be called often. Counterintuitively, a place where a hash table of `Class` objects might have trouble is how Classes are hashed -- they use `Object.hashCode()`, so they use the identity hash code. Identity hash codes have been a strangely slow area in some of my code, and I try to avoid them, but that's harder to do with `Class`. [This article was interesting about System.identityHashCode()](https://varoa.net/jvm/java/openjdk/biased-locking/2017/01/30/hashCode.html); it looks like having any calls to it interferes with "biased lock ing." It also seems like the locking can be an issue even in single-threaded situations. I'm wondering if slowdown in the map could be because the first call to `hashCode()` on each Class, or other type that uses hashCode() from Object, has to do thread-related cautionary work, and not actually from the map code itself. As an aside, the mixing in `place()` might be able to be simpler because Fury doesn't target as many platforms as this did in its original form (in [libGDX](https://libgdx.com)). Some ideas for mixing without needing 64-bit multiplication might include `final int hash = item.hashCode(); return (hash >>> shift) ^ (hash & mask);`, or some [reversible bitwise operation](https://marc-b-reynolds.github.io/math/2017/10/13/XorRotate.html) followed by masking. That could be `final int hash = item.hashCode(); return (hash ^ (hash << 5 | hash >>> 27) ^ (hash << 24 | hash >>> 8)) & mask;`, or anything else that's fast enough and gets suitable mixing for your key type. Identity hash codes probably only need masking, no mixing needed. Alternate Map implementations could absolutely do better. You're right that SwissTable doesn't have a good implementation in Java; maybe one isn't possible. I think Robin Hood hashing might be something to look into; Cuckoo hashing is what my code replaced in Kryo and before that in libGDX, and it's somewhat of a precursor to Robin Hood and Hopscotch hashing. I wouldn't recommend Cuckoo hashing unless you're certain you've tested all the corner cases -- the papers that describe it gloss over that it can "fail" without describing just how *bad* a failure can be... In libGDX, we had a proof of concept that sent about 40 4-character Strings into a Cuckoo-based map, and that forced that implementation to attempt to use more table entries than Java can actually fit in one array (it would have used an array with more than 2 to the 30 items if it could). There are some possible workarounds for Cuckoo hashing, but they would also need to be tested very thoroughly for their worst-case beha vior. Robin Hood Hashing is what Rust used before it switched to hashbrown, which I think is a SwissTable variant, though I'm not that familiar with Rust. Robin Hood Hashing is also in most cases very fast and offers good expected-case performance, but one operation was a disaster in at least some versions of Rust; it was very slow to copy one map into another. [This blog post discusses that issue in great detail](https://www.tumblr.com/accidentallyquadratic/153545455987/rust-hash-iteration-reinsertion); it's a good read. I think that bug might not surface if a Robin Hood hash table didn't change which member of a hash family it uses between instances. FuryObjectMap doesn't actually uses a hash family currently, so that might be an interesting avenue. Lastly, the simplest option: adjusting the default load factor might make a big difference. One of my several subclasses of ObjectObjectMap in different places used a specific type of key every time, `Coord`, which is a 2D point on an int grid. I found that using the [Cantor Pairing Function](https://en.wikipedia.org/wiki/Pairing_function#Cantor_pairing_function) as a hashCode() actually worked very well in that case, but only when the load factor was 0.5f or lower (the default is higher than that). The reason for that probably has to do with the data I was hashing; it was mostly square grids where I wanted to track a value with each Coord and do so in an order, and Cantor won't produce any duplicate results for points in a right isosceles triangle shape, with the size of the triangle determined by how many bits of the hashCode() were used. A load factor of 0.5f meant an extra bit that wouldn't have been used otherwise, and that essentially doubled the side lengths and made the wh ole square area covered without any collisions. A lower load factor speeds up several operations by lowering the likelihood of a collision, but it also slows down iteration, since empty table slots need to be iterated past more often. I hope I didn't get too off-track here... I'd try the load factor first, though it may require some tinkering. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
