[ 
https://issues.apache.org/jira/browse/GROOVY-7448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14569269#comment-14569269
 ] 

Tomas Bartalos commented on GROOVY-7448:
----------------------------------------

Thx, I've described my original problem 
[here|http://stackoverflow.com/questions/30598302/huge-performance-drop-in-cassandra-orm-after-million-records].
 There is a dirty workaround howto invoke rehash() on its own. I'll be glad to 
make the pull request.

> AbstractConcurrentMap performing rehash() on every insert
> ---------------------------------------------------------
>
>                 Key: GROOVY-7448
>                 URL: https://issues.apache.org/jira/browse/GROOVY-7448
>             Project: Groovy
>          Issue Type: Bug
>          Components: groovy-jdk
>    Affects Versions: 2.4.3
>            Reporter: Tomas Bartalos
>            Assignee: Guillaume Laforge
>
> The problem happens in inner class 
> org.codehaus.groovy.util.AbstractConcurrentMap.Segment.put() method (line 
> 105):
> When the current count is greater then map's threshold a rehash() happens. 
> Now the tricky part is, that the Map holds soft references to objects and 
> rehash() validates those references. So when GC discards soft references 
> (e.isValid() == false), the resulting segment doesn't expand (as it is 
> assumed in the put() method).
> in the last line of rehash() the Segmen't internal counter is updated count = 
> newCount (which is the number of undiscarded (e.isValid() == true) references 
> and can be smaller than previous count as described above)
> after rehash() is done, the put() method continues, however the buggy part 
> is, that it disregards the previous setting of the internal count and it sets 
> the previous count+1 value in every case on lines 124, 143 and 159 of 
> AbstractConcurrentMapBase class.
> Since the rehash() is called only from put() method, the internal counter is 
> not synchronized with real elements count contained in underlying array and 
> rehash() is called on all subsequent calls of put() method.
> The following steps are happening:
> 1) Map state: threshold = 786432; count=786432
> 2) New element is inserted into the map: count = 786433; threshold = 786432
> 3) since new count would be greater than threshold rehash() happens
> rehash() finds out, that most of the objects were garbage collected, thus it 
> doesn't increase the size of the Segment, however anyway it copies all the 
> objects from one table to another (System.arrayCopy()). (question: why this 
> is done, in this case the old array with repaired invalid elements whould be 
> sufficient ...? )
> 4) rehash() sets the internal count to new value, which is smaller, because 
> many objects were garbage collected (soft references), lets say: count = 486 
> 000
> 5) The put() continues, disregards the count = 486 000 and sets the count to 
> count = 786433
> 6) When another element is inserted, the count is still greater than 
> threshold so the rehash happens again
> From now on every element added to map will trigger a rehash(), which has a 
> huge performance impact.
> When this happens in multithreaded environment, all the other threads are 
> waiting (parked) for lock() until the rehash() and put() is done (and then 
> the next one is doing the rehash() again and so on). You can imagine what 
> performance impact this is.
> Proposed solution:
> Update the c variable after rehash is done. Between lines 105 and 106 of 
> org.codehaus.groovy.util.AbstractConcurrentMap  add:
> 104: if (c++ > threshold) {
> 105:                rehash();
> 106:                c = count + 1
> 107:}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to