Hi,

`Doug suggested to try another variant of resolving hash collisions in`

`open-addressing table - using a secondary hash for increment value of`

`consecutive probes. Since Java Objects provide only one hashCode()`

`function, "secondary" hash is computed from primary hash by`

`xorShift(hashCode), for example:`

r ^= r << 13; r ^= r >>> 17; r ^= r << 5;

`Such hash is then made "odd" by or-ing with 1 and used as probe`

`increment (multiplied by 2 when table uses two slots per entry such as`

`in our case).`

Adding this strategy to the mix in the simulator: http://cr.openjdk.java.net/~plevart/misc/HibrydHashtable/OpenAddressingProbeSequence.java

`Gives promising results with MethodType or Object keys (ClassValue keys`

`are perfect and don't need this anyway):`

http://cr.openjdk.java.net/~plevart/misc/HibrydHashtable/lpht_MethodType_probe_sequence.txt

`The simulation also provides results of probe length distribution for`

`linked-nodes table such as ConcurrentHashMap to compare.`

`Secondary hash as probe stride gives the best average probe length among`

`open-addressing strategies and is not dependent very much on the`

`insertion order (sorted vs. unsorted) of key's hashes, but it does not`

`give best worst-case lookup performance (max. probe length). Worst-case`

`lookup among open-addressing tables tried still belongs to quadratic`

`probing in combination with sorted insertion order.`

`I created a secondary-hash probing implementation`

`(LinearProbeHashtable3, cyan color in the graph) and compared its`

`MethodType keys lookup performance with other variants created before`

`and CHM:`

http://cr.openjdk.java.net/~plevart/misc/HibrydHashtable/lpht_MethodType_bench.pdf

`Unlike what simulator tells us about average probe length, in practice`

`(on my i7-4771 CPU), secondary-hash probing does not excel. I think that`

`it suffers from lack of locality of reference - it is not very nice to`

`CPU cache. It might have been a better option in the past when CPU`

`caches were not so sophisticated. Quadratic probing with sorted`

`insertion seems to hit the sweet spot between optimizing the locality of`

`reference and average probe length.`

`"What is a HybridHashtable?", you might ask, since it beats all others`

`in the benchmark...`

Here's the implementation: http://cr.openjdk.java.net/~plevart/misc/HibrydHashtable/lpht_MethodType_bench_src.tgz

`The above simulator shows that for MethodType or Object keys, average`

`probe length in linked-nodes tables such as CHM is half the average`

`probe length of quadratic probing open-addressing table (~0.25 vs.`

`~0.5), but quadratic probing still beats linked-nodes table because it`

`has better locality of reference. What simulator also shows is that ~80%`

`of linked-nodes buckets contains a single entry (probe-length[0]`

`histogram). HybridHashtable looks like open-addressing table for ~80% of`

`entries and like linked-nodes table for the rest of them:`

* A lone entry (K1, V1) in hybrid table: * * +-- (K1.hashCode() * 2) % table.length * | * table v * ---------+----+----+------------- * | K1 | V1 | * ---------+----+----+------------- * * When a second entry (K2, V2) is inserted that happens to hash to the * same "bucket", above situation is expanded into: * * table * ---------+----+----+------------- * | K1 | * | * ---------+----+----+------------- * | * v Node * +----+----+----+ * | V1 | K2 | V2 | * +----+----+----+ * * Similarly, when a third entry is inserted, we get: * * table * ---------+----+----+------------- * | K1 | * | * ---------+----+----+------------- * | * v Node * +----+----+----+ * | V1 | K2 | * | * +----+----+----+ * | * v Node * +----+----+----+ * | V2 | K3 | V3 | * +----+----+----+ * * ...and so on.

`Such arrangement combines the benefits of locality of reference inherent`

`to open-addressing tables with minimal average number of false probes`

`which is a property of linked-nodes tables. It seems that in practice,`

`it beats other strategies tried here when lookup performance is in question.`

Regards, Peter On 06/13/2016 07:18 PM, Peter Levart wrote:

Hi,I explored various strategies to minimize worst-case lookupperformance for MethodType keys in LinearProbeHashtable. One idea isfrom the "Hopscotch hashing" algorithm [1] which tries to optimizeplacement of keys by moving them around at each insertion or deletion.While a concurrent Hopscotch hashtable is possible, it requiresadditional "metadata" about buckets which complicates it and does notmake it practical for implementing in Java until Java gets value typesand arrays of them. The simplest idea until then is to optimizeplacement of keys when the table is rehashed. Normally when table isrehashed the old table is scanned and entries from it inserted intonew table. To achieve similar effect to "Hopscotch hashing", the orderin which keys are taken from the old table and inserted into new tableis changed. Keys are ordered by increasing bucket index as it would becomputed for the key in the new table. Inserting in this orderminimizes the worst-case lookup performance. Doing this when rehashingand not at every insertion or deletion guarantees that at least halfof keys are optimally placed.Another strategy to minimize worst-case lookup performance is to usequadratic probe sequence instead of linear probe sequence. Normally,when looking up a key, slots in the table are probed in the followingsequence (seq = 0, 1, 2 ...):index(seq) = (hashCode + seq) % tableLength Quadratic probing uses the following probe sequence: index(seq) = (hashCode + seq * (seq + 1) / 2) % tableLengthThose two strategies can be combined. Here's a simulation of usingthose two strategies in an open-addressing hashtable:http://cr.openjdk.java.net/~plevart/misc/LinearProbeHashTable/lpht_MethodType_probe_sequence.txtUsing those strategies does not affect the average length of probingsequence much (length of 0 means that the key was found at its homelocation, length of 1 means that one non-equal key was probed beforefinding the equal one, etc ...), but worst-case lookup performance isgreatly impacted. Combining both strategies minimizes the worst-caselookup performance.Benchmarking using those strategies reveals the average lookupperformance is consistently better than using CHM:http://cr.openjdk.java.net/~plevart/misc/LinearProbeHashTable/lpht_MethodType_bench.pdfThe last trick to make this happen is stolen from CHM. The methodtype's key is a WeakReference<MethodType> which caches the hashCode ofMethodType. By using cached hashCode in the key's equals()implementation as a means of optimization, we achieve similar effectthat CHM achieves when it caches key's hashCode(s) in its Entry objects.Here's the source of above benchmark:http://cr.openjdk.java.net/~plevart/misc/LinearProbeHashTable/lpht_MethodType_bench_src.tgz3 variations of LinearProbeHashtable are compared with CHM: LinearProbeHashtable - the plain one from webrev.04.4LinearProbeHashtable1 - using sorting of keys when rehashing tooptimize their placementLinearProbeHashtable2 - combines sorting of keys with quadraticprobe sequenceI think LinearProbeHashtable2 could be used in MethodType interningwithout fear of degrading lookup performance.Regards, Peter [1] https://en.wikipedia.org/wiki/Hopscotch_hashing

_______________________________________________ mlvm-dev mailing list mlvm-dev@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev