Re: Faster HashMap implementation

Doug Lea Tue, 30 Jun 2009 15:09:21 -0700


Inspired by the combination of Alex's efforts and ongoing
work on concurrent caches, I put together a version
of HashMap (called HashMapV2 for now) with a snapshot at
  http://gee.cs.oswego.edu/dl/wwwtmp/HashMapV2.java
(This retains the openjdk GPL header etc.)


There are a few remaining things to consider
before taking this too seriously as a replacement.
I'll be out for 10 days starting tomorrow so probably
won't get a chance to do so soon.

It does seem to be the best plan of attack for
open-table scheme that people keep wanting (because of
fewer cache misses etc). I think it hits all of the
other issues and concerns I listed in mail a few weeks ago.
It looks very good on various performance tests, including some
I need to add/update in our "loops" tests.

(I wish the concurrent cache version for j.u.c
was in this good a state...)

Pasted below is some of the internal documentation:


     * Overview:
     *
     * The main table uses open-addressing, with [key, value] pairs in
     * alternating elements of "kvTable" array, plus a side-array
     * "hashes" holding hashCodes of keys (plus status bit).  If uses
     * hash preconditioning for first probe, and pseudo-random probing
     * from there. This gives a very good approximation of "uniform
     * hashing" (see for example CLR ch12) across various key types.
     *
     * Key removal sometimes requires replacing keys with ABSENT
     * markers.  We minimize need for these by recording (via UNIQUE
     * bit, see below) whether we can instead safely null out entry.
     * Markers are cleared out on put when there are too many of them,
     * as triggered by decrementing "threshold" on adding marker.
     *
     * In steady state, total footprint (ignoring non-dynamic fields)
     * for open tables is less than chained, even with the side array
     * of hashes. At default load factors, the total numbers words is
     * about:
     *                  steady state              initial
     *                  min     ave     max    (default cap)
     * Open:           4.00N   6.00N   8.00N    24
     * Chained:        7.33N   8.00N   8.67N    16
     *
     * To conserve space for unused maps, arrays are left unallocated
     * upon construction and nulled upon clearing, but with the target
     * allocation size stored in negated form in threshold field,
     * which conveniently requires a (size > threshold) check on
     * insertion anyway.  Additionally, to ensure that tables do not
     * fill up with deletion markers, the threshold is decremented
     * each time an element is replaced with a ABSENT marker.  This
     * forces resize occurring on next insertion to then replace table
     * with one without markers.
     *
     * Maps of size >= CEILING_LENGTH hold overflowing entries in a
     * ternary bitwise trie (see nested class HashTrie). This provides
     * bounded handling without running out of array indices, at the
     * price of about 4X average operation costs and 2.5X per-item
     * space overhead.  The trie is also used for Null keys, which
     * simplifies and speeds up most other table operations. Because
     * operations for Maps allowing nulls don't compose in simple ways
     * (null return values do not reliably mean absent), the HashTrie
     * class is not itself a Map, and so not useful outside this
     * class.
     *

Re: Faster HashMap implementation

Reply via email to