... and please consider
Bug 6812862 <http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6812862> - provide customizable hash() algorithm in HashMap for speed tuning
again and too for HashSet.

-Ulf


Am 18.03.2010 14:54, schrieb Ulf Zibis:
+1

-Ulf

Am 18.03.2010 14:22, schrieb Osvaldo Doederlein:

The oldest collection classes were designed for the needs of J2SE 1.2, a full decade ago. This was discussed before, IIRC there was some reply from Josh agreeing that some speeed-vs-size tradeoffs made last decade should be revisited today. The extra runtime size/bloat that a specialized HashSet implementation would cost, was reasonably significant in 1999 but completely irrelevant in 2010. I mean, HashSet is a HUGELY important collection, and the benefit of any optimization of its implementation would spread to many APIs and applications.

And the problem is not only the extra value field, there is also overhead from the extra indirection (plus extra polymorphic call) from the HashSet object to the internal HashMap object. This overhead may sometimes be sufficient to block inlining and devirtualization, so it's a potentially bigger cost than just a single extra memory load (which is easily hoisted out of loops etc.). Look at this code inside HashSet for a much worse cost:

    public Iterator<E> iterator() {
        return map.keySet().iterator();
    }

Yeah we pay the cost of building the internal HashMap's key-set (which is lazily-built), just to iterate the freaking HashSet. (Notice that differently from HashMap, a Set is a true Collection that we can iterate directly without any view-collection of keys/values/entries.)

IMHO all this adds evidence that the current HashSet implementation is a significant performance bug. We need a brand-new impl that does the hashing internally, without relying on HashMap, without any unused fields, extra indirections, or surprising costs like that for iterator(). I guess it would be relatively simple to copy-paste HashMap's code, cut stuff until just a Set of keys is left, and merge in the most specific pieces of HashSet (basically just readObject()/writeObject()).



Reply via email to