[jira] [Commented] (SOLR-8922) DocSetCollector can allocate massive garbage on large indexes

Jeff Wartes (JIRA) Wed, 30 Mar 2016 16:36:52 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-8922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15219066#comment-15219066
 ]


Jeff Wartes commented on SOLR-8922:
-----------------------------------

Not yet. The major risk area would be the new ExpandingIntArray class, but it 
looked reasonable. It expands along powers of two, and although the add() and 
copyTo() calls are certainly more work than simple array assignment/retrieval, 
it still all looks like pretty simple stuff. A few ArrayList calls and some 
simple numeric comparisons mostly. 
I'm more worried about bugs in there than performance, I don't know how well 
[~steff1193] tested this, although I got the impression he was using it in 
production at the time.

There may be better approaches, but this one was handy and I'm excited enough 
that I'm going to be doing a production test. I'll have more info in a day or 
two.

As a side note, I got a similar garbage-related improvement on an earlier test 
by simply hard-coding the smallSetSize to 100000 - the expanding arrays 
approach only bought me another 3%. But of course, that 100000 is very index 
and query set dependant, so I didn't want to offer it as a general case.

> DocSetCollector can allocate massive garbage on large indexes
> -------------------------------------------------------------
>
>                 Key: SOLR-8922
>                 URL: https://issues.apache.org/jira/browse/SOLR-8922
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Jeff Wartes
>         Attachments: SOLR-8922.patch
>
>
> After reaching a point of diminishing returns tuning the GC collector, I 
> decided to take a look at where the garbage was coming from. To my surprise, 
> it turned out that for my index and query set, almost 60% of the garbage was 
> coming from this single line:
> https://github.com/apache/lucene-solr/blob/94c04237cce44cac1e40e1b8b6ee6a6addc001a5/solr/core/src/java/org/apache/solr/search/DocSetCollector.java#L49
> This is due to the simple fact that I have 86M documents in my shards. 
> Allocating a scratch array big enough to track a result set 1/64th of my 
> index (1.3M) is also almost certainly excessive, considering my 99.9th 
> percentile hit count is less than 56k.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8922) DocSetCollector can allocate massive garbage on large indexes

Reply via email to