[ https://issues.apache.org/jira/browse/SOLR-8922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15219066#comment-15219066 ]
Jeff Wartes commented on SOLR-8922: ----------------------------------- Not yet. The major risk area would be the new ExpandingIntArray class, but it looked reasonable. It expands along powers of two, and although the add() and copyTo() calls are certainly more work than simple array assignment/retrieval, it still all looks like pretty simple stuff. A few ArrayList calls and some simple numeric comparisons mostly. I'm more worried about bugs in there than performance, I don't know how well [~steff1193] tested this, although I got the impression he was using it in production at the time. There may be better approaches, but this one was handy and I'm excited enough that I'm going to be doing a production test. I'll have more info in a day or two. As a side note, I got a similar garbage-related improvement on an earlier test by simply hard-coding the smallSetSize to 100000 - the expanding arrays approach only bought me another 3%. But of course, that 100000 is very index and query set dependant, so I didn't want to offer it as a general case. > DocSetCollector can allocate massive garbage on large indexes > ------------------------------------------------------------- > > Key: SOLR-8922 > URL: https://issues.apache.org/jira/browse/SOLR-8922 > Project: Solr > Issue Type: Improvement > Reporter: Jeff Wartes > Attachments: SOLR-8922.patch > > > After reaching a point of diminishing returns tuning the GC collector, I > decided to take a look at where the garbage was coming from. To my surprise, > it turned out that for my index and query set, almost 60% of the garbage was > coming from this single line: > https://github.com/apache/lucene-solr/blob/94c04237cce44cac1e40e1b8b6ee6a6addc001a5/solr/core/src/java/org/apache/solr/search/DocSetCollector.java#L49 > This is due to the simple fact that I have 86M documents in my shards. > Allocating a scratch array big enough to track a result set 1/64th of my > index (1.3M) is also almost certainly excessive, considering my 99.9th > percentile hit count is less than 56k. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org