The "Field Collapsing" patch is dead. "Search Grouping" is a different suite of techniques that the committers are willing to commit. Note that the Field Collapsing issue has been open for 3+ years and nothing was ever committed: the Solr committers who care all hate it.
8G is not a big index. 450G is a big index. 1.5 billion docs is a big index. The greybeards won't touch a structural change that doesn't work for the wide range of use cases. The Field Collapsing patches never scaled. On Fri, Oct 15, 2010 at 5:42 AM, Marc Sturlese (JIRA) <[email protected]> wrote: > > [ > https://issues.apache.org/jira/browse/SOLR-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12921328#action_12921328 > ] > > Marc Sturlese commented on SOLR-1311: > ------------------------------------- > > Well I said it can not be integrated as a plugin because it hacks > DocListAndSetNC and DocListNC. This 2 functions just can be altered altering > the SolrIndexSearcher.java class. > The pseudo-field-collapse sort is not included in the current field > collapsing but current field collapsing seems to perform much better that it > use to (I don't think as good as this patch, but the current feature is much > more complete than my patch). > I supose I can close it. > >> pseudo-field-collapsing >> ----------------------- >> >> Key: SOLR-1311 >> URL: https://issues.apache.org/jira/browse/SOLR-1311 >> Project: Solr >> Issue Type: New Feature >> Components: search >> Affects Versions: 1.4 >> Reporter: Marc Sturlese >> Fix For: Next >> >> Attachments: SOLR-1311-pseudo-field-collapsing.patch >> >> >> I am trying to develope a new way of doing field collapsing based on the >> adjacent field collapsing algorithm. I have started developing it beacuse I >> am experiencing performance problems with the field collapsing patch with >> big index (8G). >> The algorith does adjacent-pseudo-field collapsing. It does collapsing on >> the first X documents. Instead of making the collapsed docs disapear, the >> algorith will send them to a given position of the relevance results list. >> The reason I just do collapsing in the first X documents is that if I have >> for example 600000 results and I am showing 10 results per page, I really >> don't need to do collapsing in the page 30000 or even not in the 3000. Doing >> this I am noticing dramatically better performance. The problem is I >> couldn't find a way to plug the algorithm as a component and keep good >> performance. I had to hack few classes in SolrIndexSearcher.java >> This patch is just experimental and for testing purposes. In case someone >> finds it interesting would be good do find a way to integrate it in a better >> way than it is at the moment. >> Advices are more than welcome. >> >> Functionality: >> In solrconfig.xml we specify the pseudo-collapsing parameters: >> <str name="plus.considerMoreDocs">true</str> >> <str name="plus.considerHowMany">3000</str> >> <str name="plus.considerField">name</str> >> (at the moment there's no threshold and other parameters that exist in the >> current collapse-field patch) >> plus.considerMoreDocs one enables pseudo-collapsing >> plus.considerHowMany sets the number of resultant documents in wich we want >> to apply the algorithm >> plus.considerField is the field to do pseudo-collapsing >> If the number of results is lower than plus.considerHowMany the algorithm >> will be applyed to all the results. >> Let's say there is a query with 600000 results and we've set considerHowMany >> to 3000 (and we already have the docs sorted by relevance). >> What adjacent-pseudo-collapse does is, if the 2nd doc has to be collapsed it >> will be sent to the pos 2999 of the relevance results array. If the 3th has >> to be collpased too will go to the position 2998 and successively like this. >> The algorithm is not applyed when a sortspec is set or plus.considerMoreDocs >> is set to false. It neighter is applyed when using >> MoreLikeThisRequestHanlder. >> Example with a query of 9 results: >> Results sorted by relevance without pseudo-collapse-algorithm: >> doc1 - collapse_field_value 3 >> doc2 - collapse_field_value 3 >> doc3 - collapse_field_value 4 >> doc4 - collapse_field_value 7 >> doc5 - collapse_field_value 6 >> doc6 - collapse_field_value 6 >> doc7 - collapse_field_value 5 >> doc8 - collapse_field_value 1 >> doc9 - collapse_field_value 2 >> Results pseudo-collapsed with plus.considerHowMany = 5 >> doc1 - collapse_field_value 3 >> doc3 - collapse_field_value 4 >> doc4 - collapse_field_value 7 >> doc5 - collapse_field_value 6 >> doc2 - collapse_field_value 3* >> doc6 - collapse_field_value 6 >> doc7 - collapse_field_value 5 >> doc8 - collapse_field_value 1 >> doc9 - collapse_field_value 2 >> Results pseudo-collapsed with plus.considerHowMany = 9 >> doc1 - collapse_field_value 3 >> doc3 - collapse_field_value 4 >> doc4 - collapse_field_value 7 >> doc5 - collapse_field_value 6 >> doc7 - collapse_field_value 5 >> doc8 - collapse_field_value 1 >> doc9 - collapse_field_value 2 >> doc6 - collapse_field_value 6* >> doc2 - collapse_field_value 3* >> *pseudo-collapsed documents > > -- > This message is automatically generated by JIRA. > - > You can reply to this email to add a comment to the issue online. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > -- Lance Norskog [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
