Tomas Bartalos created GROOVY-7448:
--------------------------------------

             Summary: AbstractConcurrentMap performing rehash() on every insert
                 Key: GROOVY-7448
                 URL: https://issues.apache.org/jira/browse/GROOVY-7448
             Project: Groovy
          Issue Type: Bug
          Components: groovy-jdk
    Affects Versions: 2.4.3
            Reporter: Tomas Bartalos
            Assignee: Guillaume Laforge


The problem happens in inner class 
org.codehaus.groovy.util.AbstractConcurrentMap.Segment.put() method (line 105):

When the current count is greater then map's threshold a rehash() happens. Now 
the tricky part is, that the Map holds soft references to objects and rehash() 
validates those references. So when GC discards soft references (e.isValid() == 
false), the resulting segment doesn't expand (as it is assumed in the put() 
method).

in the last line of rehash() the Segmen't internal counter is updated count = 
newCount (which is the number of undiscarded (e.isValid() == true) references 
and can be smaller than previous count as described above)

after rehash() is done, the put() method continues, however the buggy part is, 
that it disregards the previous setting of the internal count and it sets the 
previous count+1 value in every case on lines 124, 143 and 159 of 
AbstractConcurrentMapBase class.
Since the rehash() is called only from put() method, the internal counter is 
not synchronized with real elements count contained in underlying array and 
rehash() is called on all subsequent calls of put() method.

The following steps are happening:

1) Map state: threshold = 786432; count=786432
2) New element is inserted into the map: count = 786433; threshold = 786432
3) since new count would be greater than threshold rehash() happens
rehash() finds out, that most of the objects were garbage collected, thus it 
doesn't increase the size of the Segment, however anyway it copies all the 
objects from one table to another (System.arrayCopy()). (question: why this is 
done, in this case the old array with repaired invalid elements whould be 
sufficient ...? )
4) rehash() sets the internal count to new value, which is smaller, because 
many objects were garbage collected (soft references), lets say: count = 486 000
5) The put() continues, disregards the count = 486 000 and sets the count to 
count = 786433
6) When another element is inserted, the count is still greater than threshold 
so the rehash happens again
>From now on every element added to map will trigger a rehash(), which has a 
>huge performance impact.

When this happens in multithreaded environment, all the other threads are 
waiting (parked) for lock() until the rehash() and put() is done (and then the 
next one is doing the rehash() again and so on). You can imagine what 
performance impact this is.

Proposed solution:

Update the c variable after rehash is done. Between lines 105 and 106 of add:
104: if (c++ > threshold) {
105:                rehash();
106:                c = count + 1
107:}

c = count + 1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to