Inspired by the combination of Alex's efforts and ongoing work on concurrent caches, I put together a version of HashMap (called HashMapV2 for now) with a snapshot at http://gee.cs.oswego.edu/dl/wwwtmp/HashMapV2.java (This retains the openjdk GPL header etc.)
There are a few remaining things to consider before taking this too seriously as a replacement. I'll be out for 10 days starting tomorrow so probably won't get a chance to do so soon. It does seem to be the best plan of attack for open-table scheme that people keep wanting (because of fewer cache misses etc). I think it hits all of the other issues and concerns I listed in mail a few weeks ago. It looks very good on various performance tests, including some I need to add/update in our "loops" tests. (I wish the concurrent cache version for j.u.c was in this good a state...) Pasted below is some of the internal documentation: * Overview: * * The main table uses open-addressing, with [key, value] pairs in * alternating elements of "kvTable" array, plus a side-array * "hashes" holding hashCodes of keys (plus status bit). If uses * hash preconditioning for first probe, and pseudo-random probing * from there. This gives a very good approximation of "uniform * hashing" (see for example CLR ch12) across various key types. * * Key removal sometimes requires replacing keys with ABSENT * markers. We minimize need for these by recording (via UNIQUE * bit, see below) whether we can instead safely null out entry. * Markers are cleared out on put when there are too many of them, * as triggered by decrementing "threshold" on adding marker. * * In steady state, total footprint (ignoring non-dynamic fields) * for open tables is less than chained, even with the side array * of hashes. At default load factors, the total numbers words is * about: * steady state initial * min ave max (default cap) * Open: 4.00N 6.00N 8.00N 24 * Chained: 7.33N 8.00N 8.67N 16 * * To conserve space for unused maps, arrays are left unallocated * upon construction and nulled upon clearing, but with the target * allocation size stored in negated form in threshold field, * which conveniently requires a (size > threshold) check on * insertion anyway. Additionally, to ensure that tables do not * fill up with deletion markers, the threshold is decremented * each time an element is replaced with a ABSENT marker. This * forces resize occurring on next insertion to then replace table * with one without markers. * * Maps of size >= CEILING_LENGTH hold overflowing entries in a * ternary bitwise trie (see nested class HashTrie). This provides * bounded handling without running out of array indices, at the * price of about 4X average operation costs and 2.5X per-item * space overhead. The trie is also used for Null keys, which * simplifies and speeds up most other table operations. Because * operations for Maps allowing nulls don't compose in simple ways * (null return values do not reliably mean absent), the HashTrie * class is not itself a Map, and so not useful outside this * class. *