Alex Yakovlev wrote:
To have single-indirect access we need to read
key/value objects with the first array read.

Backing up a little, there are five main forces
in hash table design. These are fresh in my mind because
I've been trying to put together CustomConcurrentHashMap,
the last planned JSR166y JDK7 component, that handles
weak/soft refs, custom equality comparisons, eviction, etc.
(Concurrency adds further forces omitted here.)

1. Memory stalls
   - Cache misses due to randomization of indexing can easily
     cost hundreds of cycles. This is made worse if there are
     multiple levels of indirection each with poor locality.
     (Multiple levels with good locality are much less of a
     problem, which is why ConcurrentHashMap is pretty fast
     despite adding one). And this is made even worse when
     there are too many computations before processor can even
     do a prefetch, which is one reason why bit-spreading functions
     require careful speed vs uniformity tradeoff analysis.
2. Method Dispatching
   - Both Object.equals and Object.hashCode are often
     "megamorphic", i.e., cannot be inlined because they
     are overridden in too many classes. In some usages,
     this accounts for the main performance bottleneck.
     So caching hashcodes, which usually also allows skipping
     calls to equals for different objects, has a noticeable effect.
     Also, it is sometimes possible to bias the structure of code
     to encourage compilers to do more per-call inlining,
     which helps a lot. Note that these effects can be difficult
     to test. Test programs that include multiple key types
     (like our DenseMapMicrobenchmark) are a little more
     accurate indicators of megamorphism effects, but
     even here, these kinds of microbenchmarks will often
     cause compilers to do deeper inlining than they would in
     most actual usages.
3. Collision handling
   - Collisions lead to some sort of slower scanning. Especially
     given the highly variable quality of user-supplied
     hashCodes, you'd like to localize the impact of collisions
     to avoid per-map slowdowns. Linear-probes and related techniques
     don't fare very well on these grounds.
4. Biasing for expected operation mixes
   - I once checked that collection (and in particular HashMap/Hashtable)
     usage approximately obeys the usual 4:1 read vs write ratio
     seen in just about every aspect of computing. In fact it seems
     a bit more extreme -- something like 84% read (get() or iterate)
     15% add (put() and variants) and 1% delete (remove, clear). So
     making reads fast is clearly the highest priority.
5. Footprint
   - Many Java frameworks (especially EE) can create hundreds of
     thousands of collections, many of them very small, but
     also some huge ones. This leads to second-order bloat
     effects, like high TLB miss rates, and much slower GC. For
     some details see papers by Gary Sevitsky and colleagues at
http://domino.research.ibm.com/comm/research_people.nsf/pages/sevitsky.pubs.html
     This is a difficult issue with hash tables since the whole
     idea is to preallocate space to allow random indexing. And
     because resizing is not cheap, you don't want to be overly
     incremental about it.


And we cannot store collisons in pre-allocated
arrays - to read it from the same array with keys/values
it needs to be an object. Thus writing speed on
my original map will always be faster.


Right. My thought here is to use an alternative structure
for collisions. So, given capacity C, you'd have:
  elements[] -- 2*C of key/value
  hashes[] -- 1*C of hash + bookkeeping bits
  collisions -- a dynamic bitwise trie or somesuch
Compared to current HashMap...
  + It allows parallel prefetches on get; of both
    elements[2*h] and hashes[h], so should reduce memory stalls.
  - It triples the unloaded footprint. You can get most of
    this back in default-constructor usages by not allocating
    arrays, but instead using only collisions tree until size
    reaches say 4, and then using a larger (say 64) initial capacity
    once you know you aren't a tiny map.
  = Collisions may or may not be a little more expensive to resolve
  - Iterators are likely slower and definitely a lot messier
  - The trie nodes needed here would require 2 or 3 more
    pointer fields than do list-based nodes, so the incremental
    footprint grows faster.
  - Resizing is slower and more complicated since the collisions
    may or may not be moved to main table.

Without fleshing this out, I'm not sure whether this hits the right
tradeoffs.

-Doug



Reply via email to