> Thanks for that advise; I trust your judgement! I'm not being judgmental here, I speak from experience. I also though it didn't make sense to rehash int keys when I was writing HPPC but life quickly taught me the opposite. The problem was that hash values took slots modulo 2^n and, for example, we had keys which were constructed as:
primaryValue << 32 | secondaryValue which then conflict for nearly _all_ secondary values, regardless of the primaryValue. Also, rehashing in my experiments didn't really hurt performance -- it was all dwarfed by other aspects in which hashmaps were used (and I still suspect modern CPUs wait for memory accesses there so you could calculate 10x more in the mean time and nothing would happen). > Do you have an opinion on copying IntHashSet directly to Lucene-core or > some other approach? Up to you, really. There are a few factors to consider: 1) if you know the distribution of your keys you can use specialized hashing/ bucketing that will be matched to your data, 2) if you need juc compliance (iterators and such) Sebastiano Vigna's fastutil has a nice set of interfaces that allows both boxed value iterators and primitive iterators (without the need of going to boxed types). fastutil's implementations are nearly identical to HPPC's and the performance is similar (with differences at noise levels I'd say). 3) you need to decide whether to use open hashing or linear hashing. Both have pros and cons. fastutil and HPPC use linear hashing because it's more cache friendly -- an older version of both libraries will have code for double hashing if you want to rip it. Dawid --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
