[ https://issues.apache.org/jira/browse/GROOVY-7448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14569269#comment-14569269 ]
Tomas Bartalos commented on GROOVY-7448: ---------------------------------------- Thx, I've described my original problem [here|http://stackoverflow.com/questions/30598302/huge-performance-drop-in-cassandra-orm-after-million-records]. There is a dirty workaround howto invoke rehash() on its own. I'll be glad to make the pull request. > AbstractConcurrentMap performing rehash() on every insert > --------------------------------------------------------- > > Key: GROOVY-7448 > URL: https://issues.apache.org/jira/browse/GROOVY-7448 > Project: Groovy > Issue Type: Bug > Components: groovy-jdk > Affects Versions: 2.4.3 > Reporter: Tomas Bartalos > Assignee: Guillaume Laforge > > The problem happens in inner class > org.codehaus.groovy.util.AbstractConcurrentMap.Segment.put() method (line > 105): > When the current count is greater then map's threshold a rehash() happens. > Now the tricky part is, that the Map holds soft references to objects and > rehash() validates those references. So when GC discards soft references > (e.isValid() == false), the resulting segment doesn't expand (as it is > assumed in the put() method). > in the last line of rehash() the Segmen't internal counter is updated count = > newCount (which is the number of undiscarded (e.isValid() == true) references > and can be smaller than previous count as described above) > after rehash() is done, the put() method continues, however the buggy part > is, that it disregards the previous setting of the internal count and it sets > the previous count+1 value in every case on lines 124, 143 and 159 of > AbstractConcurrentMapBase class. > Since the rehash() is called only from put() method, the internal counter is > not synchronized with real elements count contained in underlying array and > rehash() is called on all subsequent calls of put() method. > The following steps are happening: > 1) Map state: threshold = 786432; count=786432 > 2) New element is inserted into the map: count = 786433; threshold = 786432 > 3) since new count would be greater than threshold rehash() happens > rehash() finds out, that most of the objects were garbage collected, thus it > doesn't increase the size of the Segment, however anyway it copies all the > objects from one table to another (System.arrayCopy()). (question: why this > is done, in this case the old array with repaired invalid elements whould be > sufficient ...? ) > 4) rehash() sets the internal count to new value, which is smaller, because > many objects were garbage collected (soft references), lets say: count = 486 > 000 > 5) The put() continues, disregards the count = 486 000 and sets the count to > count = 786433 > 6) When another element is inserted, the count is still greater than > threshold so the rehash happens again > From now on every element added to map will trigger a rehash(), which has a > huge performance impact. > When this happens in multithreaded environment, all the other threads are > waiting (parked) for lock() until the rehash() and put() is done (and then > the next one is doing the rehash() again and so on). You can imagine what > performance impact this is. > Proposed solution: > Update the c variable after rehash is done. Between lines 105 and 106 of > org.codehaus.groovy.util.AbstractConcurrentMap add: > 104: if (c++ > threshold) { > 105: rehash(); > 106: c = count + 1 > 107:} -- This message was sent by Atlassian JIRA (v6.3.4#6332)