[ 
https://issues.apache.org/jira/browse/GROOVY-7448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomas Bartalos updated GROOVY-7448:
-----------------------------------
    Description: 
The problem happens in inner class 
org.codehaus.groovy.util.AbstractConcurrentMap.Segment.put() method (line 105):

When the current count is greater then map's threshold a rehash() happens. Now 
the tricky part is, that the Map holds soft references to objects and rehash() 
validates those references. So when GC discards soft references (e.isValid() == 
false), the resulting segment doesn't expand (as it is assumed in the put() 
method).

in the last line of rehash() the Segmen't internal counter is updated count = 
newCount (which is the number of undiscarded (e.isValid() == true) references 
and can be smaller than previous count as described above)

after rehash() is done, the put() method continues, however the buggy part is, 
that it disregards the previous setting of the internal count and it sets the 
previous count+1 value in every case on lines 124, 143 and 159 of 
AbstractConcurrentMapBase class.
Since the rehash() is called only from put() method, the internal counter is 
not synchronized with real elements count contained in underlying array and 
rehash() is called on all subsequent calls of put() method.

The following steps are happening:

1) Map state: threshold = 786432; count=786432
2) New element is inserted into the map: count = 786433; threshold = 786432
3) since new count would be greater than threshold rehash() happens
rehash() finds out, that most of the objects were garbage collected, thus it 
doesn't increase the size of the Segment, however anyway it copies all the 
objects from one table to another (System.arrayCopy()). (question: why this is 
done, in this case the old array with repaired invalid elements whould be 
sufficient ...? )
4) rehash() sets the internal count to new value, which is smaller, because 
many objects were garbage collected (soft references), lets say: count = 486 000
5) The put() continues, disregards the count = 486 000 and sets the count to 
count = 786433
6) When another element is inserted, the count is still greater than threshold 
so the rehash happens again
>From now on every element added to map will trigger a rehash(), which has a 
>huge performance impact.

When this happens in multithreaded environment, all the other threads are 
waiting (parked) for lock() until the rehash() and put() is done (and then the 
next one is doing the rehash() again and so on). You can imagine what 
performance impact this is.

Proposed solution:

Update the c variable after rehash is done. Between lines 105 and 106 of 
org.codehaus.groovy.util.AbstractConcurrentMap  add:
104: if (c++ > threshold) {

105:                rehash();

106:                c = count + 1

107:}

  was:
The problem happens in inner class 
org.codehaus.groovy.util.AbstractConcurrentMap.Segment.put() method (line 105):

When the current count is greater then map's threshold a rehash() happens. Now 
the tricky part is, that the Map holds soft references to objects and rehash() 
validates those references. So when GC discards soft references (e.isValid() == 
false), the resulting segment doesn't expand (as it is assumed in the put() 
method).

in the last line of rehash() the Segmen't internal counter is updated count = 
newCount (which is the number of undiscarded (e.isValid() == true) references 
and can be smaller than previous count as described above)

after rehash() is done, the put() method continues, however the buggy part is, 
that it disregards the previous setting of the internal count and it sets the 
previous count+1 value in every case on lines 124, 143 and 159 of 
AbstractConcurrentMapBase class.
Since the rehash() is called only from put() method, the internal counter is 
not synchronized with real elements count contained in underlying array and 
rehash() is called on all subsequent calls of put() method.

The following steps are happening:

1) Map state: threshold = 786432; count=786432
2) New element is inserted into the map: count = 786433; threshold = 786432
3) since new count would be greater than threshold rehash() happens
rehash() finds out, that most of the objects were garbage collected, thus it 
doesn't increase the size of the Segment, however anyway it copies all the 
objects from one table to another (System.arrayCopy()). (question: why this is 
done, in this case the old array with repaired invalid elements whould be 
sufficient ...? )
4) rehash() sets the internal count to new value, which is smaller, because 
many objects were garbage collected (soft references), lets say: count = 486 000
5) The put() continues, disregards the count = 486 000 and sets the count to 
count = 786433
6) When another element is inserted, the count is still greater than threshold 
so the rehash happens again
>From now on every element added to map will trigger a rehash(), which has a 
>huge performance impact.

When this happens in multithreaded environment, all the other threads are 
waiting (parked) for lock() until the rehash() and put() is done (and then the 
next one is doing the rehash() again and so on). You can imagine what 
performance impact this is.

Proposed solution:

Update the c variable after rehash is done. Between lines 105 and 106 of 
org.codehaus.groovy.util.AbstractConcurrentMap  add:
104: if (c++ > threshold) {
105:                rehash();
106:                c = count + 1
107:}

c = count + 1


> AbstractConcurrentMap performing rehash() on every insert
> ---------------------------------------------------------
>
>                 Key: GROOVY-7448
>                 URL: https://issues.apache.org/jira/browse/GROOVY-7448
>             Project: Groovy
>          Issue Type: Bug
>          Components: groovy-jdk
>    Affects Versions: 2.4.3
>            Reporter: Tomas Bartalos
>            Assignee: Guillaume Laforge
>
> The problem happens in inner class 
> org.codehaus.groovy.util.AbstractConcurrentMap.Segment.put() method (line 
> 105):
> When the current count is greater then map's threshold a rehash() happens. 
> Now the tricky part is, that the Map holds soft references to objects and 
> rehash() validates those references. So when GC discards soft references 
> (e.isValid() == false), the resulting segment doesn't expand (as it is 
> assumed in the put() method).
> in the last line of rehash() the Segmen't internal counter is updated count = 
> newCount (which is the number of undiscarded (e.isValid() == true) references 
> and can be smaller than previous count as described above)
> after rehash() is done, the put() method continues, however the buggy part 
> is, that it disregards the previous setting of the internal count and it sets 
> the previous count+1 value in every case on lines 124, 143 and 159 of 
> AbstractConcurrentMapBase class.
> Since the rehash() is called only from put() method, the internal counter is 
> not synchronized with real elements count contained in underlying array and 
> rehash() is called on all subsequent calls of put() method.
> The following steps are happening:
> 1) Map state: threshold = 786432; count=786432
> 2) New element is inserted into the map: count = 786433; threshold = 786432
> 3) since new count would be greater than threshold rehash() happens
> rehash() finds out, that most of the objects were garbage collected, thus it 
> doesn't increase the size of the Segment, however anyway it copies all the 
> objects from one table to another (System.arrayCopy()). (question: why this 
> is done, in this case the old array with repaired invalid elements whould be 
> sufficient ...? )
> 4) rehash() sets the internal count to new value, which is smaller, because 
> many objects were garbage collected (soft references), lets say: count = 486 
> 000
> 5) The put() continues, disregards the count = 486 000 and sets the count to 
> count = 786433
> 6) When another element is inserted, the count is still greater than 
> threshold so the rehash happens again
> From now on every element added to map will trigger a rehash(), which has a 
> huge performance impact.
> When this happens in multithreaded environment, all the other threads are 
> waiting (parked) for lock() until the rehash() and put() is done (and then 
> the next one is doing the rehash() again and so on). You can imagine what 
> performance impact this is.
> Proposed solution:
> Update the c variable after rehash is done. Between lines 105 and 106 of 
> org.codehaus.groovy.util.AbstractConcurrentMap  add:
> 104: if (c++ > threshold) {
> 105:                rehash();
> 106:                c = count + 1
> 107:}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to