[jira] Updated: (SOLR-2205) Grouping performance improvements

Martijn van Groningen (JIRA) Thu, 28 Oct 2010 11:16:46 -0700

     [ 
https://issues.apache.org/jira/browse/SOLR-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Martijn van Groningen updated SOLR-2205:
----------------------------------------

    Attachment: SOLR-2205.patch

The code I initially wrote was on the pre-flex code base. So I took that code 
and made it work for the trunk. So someone should definitely check it out if 
all the changes I made are the right changes. 

I tested this patch out on my local machine and when doing a search (q=*:*) on 
an index that holds 10M documents, the searchtime was around 300 ms whereas the 
same query without the code changes had a searchtime of around 2.8 seconds.  So 
that is +/- 9 times faster. These numbers are based on a basic search, so no 
facets or highlighting etc.

I found out that the following piece of code took relatively a lot time to 
execute (if it was executed millions and millions of times, you started to 
notice):
{code}
filler.fillValue(doc);
groupMap.get(mval);
{code} 

This fragment is used in the TopGroupCollector and Phase2GroupCollector. I put 
some code in front of it the easily exclude documents that are not competitive. 
 This code in both classes is cheaper then using the fragment above.

Since I ported the code from pre-flex code I needed to make some changes to it 
and support  grouping by function. The code I initially wrote only needed to 
support grouping on a field. Since the trunk also supports grouping by function 
query, I added two methods to DocValues and implemented these methods in three 
subclasses. I don't know if this particular change is good, but it works. I 
think that it would be really helpful is someone can give feedback on this 
particular change.

> Grouping performance improvements
> ---------------------------------
>
>                 Key: SOLR-2205
>                 URL: https://issues.apache.org/jira/browse/SOLR-2205
>             Project: Solr
>          Issue Type: Sub-task
>          Components: search
>    Affects Versions: 4.0
>            Reporter: Martijn van Groningen
>             Fix For: 4.0
>
>         Attachments: SOLR-2205.patch
>
>
> This issue is dedicated to the performance of the grouping functionality.
> I've noticed that the code is not really performing on large indexes. Doing a 
> search (q=*:*) with grouping on an index from around 5M documents took around 
> one second on my local development machine. We had to support grouping on an 
> index that holds around 50M documents per machine, so we made some changes 
> and were able to happily serve that amount of documents. Patch will follow 
> soon.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (SOLR-2205) Grouping performance improvements

Reply via email to