Re: [jira] Commented: (SOLR-1311) pseudo-field-collapsing

Lance Norskog Sat, 16 Oct 2010 14:32:26 -0700

The "Field Collapsing" patch is dead. "Search Grouping" is a different
suite of techniques that the committers are willing to commit. Note
that the Field Collapsing issue has been open for 3+ years and nothing
was ever committed: the Solr committers who care all hate it.


8G is not a big index. 450G is a big index. 1.5 billion docs is a big
index. The greybeards won't touch a structural change that doesn't
work for the wide range of use cases. The Field Collapsing patches
never scaled.

On Fri, Oct 15, 2010 at 5:42 AM, Marc Sturlese (JIRA) <[email protected]> wrote:
>
>    [ 
> https://issues.apache.org/jira/browse/SOLR-1311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12921328#action_12921328
>  ]
>
> Marc Sturlese commented on SOLR-1311:
> -------------------------------------
>
> Well I said it can not be integrated as a plugin because it hacks 
> DocListAndSetNC and DocListNC. This 2 functions just can be altered altering 
> the SolrIndexSearcher.java class.
> The pseudo-field-collapse sort is not included in the current field 
> collapsing but current field collapsing seems to perform much better that it 
> use to (I don't think as good as this patch, but the current feature is much 
> more complete than my patch).
> I supose I can close it.
>
>> pseudo-field-collapsing
>> -----------------------
>>
>>                 Key: SOLR-1311
>>                 URL: https://issues.apache.org/jira/browse/SOLR-1311
>>             Project: Solr
>>          Issue Type: New Feature
>>          Components: search
>>    Affects Versions: 1.4
>>            Reporter: Marc Sturlese
>>             Fix For: Next
>>
>>         Attachments: SOLR-1311-pseudo-field-collapsing.patch
>>
>>
>> I am trying to develope a new way of doing field collapsing based on the 
>> adjacent field collapsing algorithm. I have started developing it beacuse I 
>> am experiencing performance problems with the field collapsing patch with 
>> big index (8G).
>> The algorith does adjacent-pseudo-field collapsing. It does collapsing on 
>> the first X documents. Instead of making the collapsed docs disapear, the 
>> algorith will send them to a given position of the relevance results list.
>> The reason I just do collapsing in the first X documents is that if I have 
>> for example 600000 results and I am showing 10 results per page, I really 
>> don't need to do collapsing in the page 30000 or even not in the 3000. Doing 
>> this I am noticing dramatically better performance. The problem is I 
>> couldn't find a way to plug the algorithm as a component and keep good 
>> performance. I had to hack few classes in SolrIndexSearcher.java
>> This patch is just experimental and for testing purposes. In case someone 
>> finds it interesting would be good do find a way to integrate it in a better 
>> way than it is at the moment.
>> Advices are more than welcome.
>>
>> Functionality:
>> In solrconfig.xml we specify the pseudo-collapsing parameters:
>>      <str name="plus.considerMoreDocs">true</str>
>>      <str name="plus.considerHowMany">3000</str>
>>      <str name="plus.considerField">name</str>
>> (at the moment there's no threshold and other parameters that exist in the 
>> current collapse-field patch)
>> plus.considerMoreDocs one enables pseudo-collapsing
>> plus.considerHowMany sets the number of resultant documents in wich we want 
>> to apply the algorithm
>> plus.considerField is the field to do pseudo-collapsing
>> If the number of results is lower than plus.considerHowMany the algorithm 
>> will be applyed to all the results.
>> Let's say there is a query with 600000 results and we've set considerHowMany 
>> to 3000 (and we already have the docs sorted by relevance).
>> What adjacent-pseudo-collapse does is, if the 2nd doc has to be collapsed it 
>> will be sent to the pos 2999 of the relevance results array. If the 3th has 
>> to be collpased too  will go to the position 2998 and successively like this.
>> The algorithm is not applyed when a sortspec is set or plus.considerMoreDocs 
>> is set to false. It neighter is applyed when using 
>> MoreLikeThisRequestHanlder.
>> Example with a query of 9 results:
>> Results sorted by relevance without pseudo-collapse-algorithm:
>> doc1 - collapse_field_value 3
>> doc2 - collapse_field_value 3
>> doc3 - collapse_field_value 4
>> doc4 - collapse_field_value 7
>> doc5 - collapse_field_value 6
>> doc6 - collapse_field_value 6
>> doc7 - collapse_field_value 5
>> doc8 - collapse_field_value 1
>> doc9 - collapse_field_value 2
>> Results pseudo-collapsed with plus.considerHowMany = 5
>> doc1 - collapse_field_value 3
>> doc3 - collapse_field_value 4
>> doc4 - collapse_field_value 7
>> doc5 - collapse_field_value 6
>> doc2 - collapse_field_value 3*
>> doc6 - collapse_field_value 6
>> doc7 - collapse_field_value 5
>> doc8 - collapse_field_value 1
>> doc9 - collapse_field_value 2
>> Results pseudo-collapsed with plus.considerHowMany = 9
>> doc1 - collapse_field_value 3
>> doc3 - collapse_field_value 4
>> doc4 - collapse_field_value 7
>> doc5 - collapse_field_value 6
>> doc7 - collapse_field_value 5
>> doc8 - collapse_field_value 1
>> doc9 - collapse_field_value 2
>> doc6 - collapse_field_value 6*
>> doc2 - collapse_field_value 3*
>> *pseudo-collapsed documents
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>



-- 
Lance Norskog
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [jira] Commented: (SOLR-1311) pseudo-field-collapsing

Reply via email to