[ https://issues.apache.org/jira/browse/SOLR-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Martijn van Groningen updated SOLR-2205: ---------------------------------------- Attachment: SOLR-2205.patch The code I initially wrote was on the pre-flex code base. So I took that code and made it work for the trunk. So someone should definitely check it out if all the changes I made are the right changes. I tested this patch out on my local machine and when doing a search (q=*:*) on an index that holds 10M documents, the searchtime was around 300 ms whereas the same query without the code changes had a searchtime of around 2.8 seconds. So that is +/- 9 times faster. These numbers are based on a basic search, so no facets or highlighting etc. I found out that the following piece of code took relatively a lot time to execute (if it was executed millions and millions of times, you started to notice): {code} filler.fillValue(doc); groupMap.get(mval); {code} This fragment is used in the TopGroupCollector and Phase2GroupCollector. I put some code in front of it the easily exclude documents that are not competitive. This code in both classes is cheaper then using the fragment above. Since I ported the code from pre-flex code I needed to make some changes to it and support grouping by function. The code I initially wrote only needed to support grouping on a field. Since the trunk also supports grouping by function query, I added two methods to DocValues and implemented these methods in three subclasses. I don't know if this particular change is good, but it works. I think that it would be really helpful is someone can give feedback on this particular change. > Grouping performance improvements > --------------------------------- > > Key: SOLR-2205 > URL: https://issues.apache.org/jira/browse/SOLR-2205 > Project: Solr > Issue Type: Sub-task > Components: search > Affects Versions: 4.0 > Reporter: Martijn van Groningen > Fix For: 4.0 > > Attachments: SOLR-2205.patch > > > This issue is dedicated to the performance of the grouping functionality. > I've noticed that the code is not really performing on large indexes. Doing a > search (q=*:*) with grouping on an index from around 5M documents took around > one second on my local development machine. We had to support grouping on an > index that holds around 50M documents per machine, so we made some changes > and were able to happily serve that amount of documents. Patch will follow > soon. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org