+1
-Ulf
Am 18.03.2010 14:22, schrieb Osvaldo Doederlein:
The oldest collection classes were designed for the needs of J2SE 1.2,
a full decade ago. This was discussed before, IIRC there was some
reply from Josh agreeing that some speeed-vs-size tradeoffs made last
decade should be revisited today. The extra runtime size/bloat that a
specialized HashSet implementation would cost, was reasonably
significant in 1999 but completely irrelevant in 2010. I mean, HashSet
is a HUGELY important collection, and the benefit of any optimization
of its implementation would spread to many APIs and applications.
And the problem is not only the extra value field, there is also
overhead from the extra indirection (plus extra polymorphic call) from
the HashSet object to the internal HashMap object. This overhead may
sometimes be sufficient to block inlining and devirtualization, so
it's a potentially bigger cost than just a single extra memory load
(which is easily hoisted out of loops etc.). Look at this code inside
HashSet for a much worse cost:
public Iterator<E> iterator() {
return map.keySet().iterator();
}
Yeah we pay the cost of building the internal HashMap's key-set (which
is lazily-built), just to iterate the freaking HashSet. (Notice that
differently from HashMap, a Set is a true Collection that we can
iterate directly without any view-collection of keys/values/entries.)
IMHO all this adds evidence that the current HashSet implementation is
a significant performance bug. We need a brand-new impl that does the
hashing internally, without relying on HashMap, without any unused
fields, extra indirections, or surprising costs like that for
iterator(). I guess it would be relatively simple to copy-paste
HashMap's code, cut stuff until just a Set of keys is left, and merge
in the most specific pieces of HashSet (basically just
readObject()/writeObject()).