Re: Do Set implementations waste memory?

Ulf Zibis Thu, 18 Mar 2010 06:54:57 -0700

+1

-Ulf


Am 18.03.2010 14:22, schrieb Osvaldo Doederlein:

The oldest collection classes were designed for the needs of J2SE 1.2,a full decade ago. This was discussed before, IIRC there was somereply from Josh agreeing that some speeed-vs-size tradeoffs made lastdecade should be revisited today. The extra runtime size/bloat that aspecialized HashSet implementation would cost, was reasonablysignificant in 1999 but completely irrelevant in 2010. I mean, HashSetis a HUGELY important collection, and the benefit of any optimizationof its implementation would spread to many APIs and applications.
And the problem is not only the extra value field, there is alsooverhead from the extra indirection (plus extra polymorphic call) fromthe HashSet object to the internal HashMap object. This overhead maysometimes be sufficient to block inlining and devirtualization, soit's a potentially bigger cost than just a single extra memory load(which is easily hoisted out of loops etc.). Look at this code insideHashSet for a much worse cost:
    public Iterator<E> iterator() {
        return map.keySet().iterator();
    }
Yeah we pay the cost of building the internal HashMap's key-set (whichis lazily-built), just to iterate the freaking HashSet. (Notice thatdifferently from HashMap, a Set is a true Collection that we caniterate directly without any view-collection of keys/values/entries.)
IMHO all this adds evidence that the current HashSet implementation isa significant performance bug. We need a brand-new impl that does thehashing internally, without relying on HashMap, without any unusedfields, extra indirections, or surprising costs like that foriterator(). I guess it would be relatively simple to copy-pasteHashMap's code, cut stuff until just a Set of keys is left, and mergein the most specific pieces of HashSet (basically justreadObject()/writeObject()).

Re: Do Set implementations waste memory?

Reply via email to