[jira] [Commented] (SOLR-8922) DocSetCollector can allocate massive garbage on large indexes

David Smiley (JIRA) Sat, 09 Apr 2016 21:27:12 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-8922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15233876#comment-15233876
 ]


David Smiley commented on SOLR-8922:
------------------------------------

Wow; I'm (pleasantly) surprised to see such a general performance increase; I 
thought this was just about saving memory.  Why is it faster?  Less GC time?

I'm confused by the benchmark and/or I don't understand the setup.  

bq. 20% chance of a document missing the value for a field.

Put another way, do you mean any given term has an 80% chance of being in the 
doc?

I'm confused why the number of terms that are in the field has anything to do 
with the performance of this patch.  Perhaps what you've done in your benchmark 
is have the fields with the larger number of terms result in any given term 
matching fewer documents?  I think it would be far clearer to report the 
performance increase over varying number of docs that were counted in the doc 
set.  However many terms were in the field doesn't really matter in and of 
itself (I think).  Couldn't you have done all this in one field and just chosen 
your 50 term queries based on those terms that have the same(ish) document 
frequency.  It might be as a percentage of the total docs (thus making the 
numbers more generally interpretable).

> DocSetCollector can allocate massive garbage on large indexes
> -------------------------------------------------------------
>
>                 Key: SOLR-8922
>                 URL: https://issues.apache.org/jira/browse/SOLR-8922
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Jeff Wartes
>            Assignee: Yonik Seeley
>         Attachments: SOLR-8922.patch, SOLR-8922.patch
>
>
> After reaching a point of diminishing returns tuning the GC collector, I 
> decided to take a look at where the garbage was coming from. To my surprise, 
> it turned out that for my index and query set, almost 60% of the garbage was 
> coming from this single line:
> https://github.com/apache/lucene-solr/blob/94c04237cce44cac1e40e1b8b6ee6a6addc001a5/solr/core/src/java/org/apache/solr/search/DocSetCollector.java#L49
> This is due to the simple fact that I have 86M documents in my shards. 
> Allocating a scratch array big enough to track a result set 1/64th of my 
> index (1.3M) is also almost certainly excessive, considering my 99.9th 
> percentile hit count is less than 56k.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-8922) DocSetCollector can allocate massive garbage on large indexes

Reply via email to