[jira] [Commented] (LUCENE-3097) Post grouping faceting

Michael McCandless (JIRA) Mon, 23 May 2011 15:52:31 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038297#comment-13038297
 ]


Michael McCandless commented on LUCENE-3097:
--------------------------------------------


Patch looks good Martijn!  A few small things:

  * I think create() needs to be fixed to handle other SortField
    types?  Eg, INT, FLOAT?

  * I think you need to hold the docBase from each setNextReader and
    re-base your docs stored in the GroupHead?  Because when you
    retrieve them in the end you return them as top-level docIDs.

This would really benefit from the random test in TestGrouping :)

This can indeed help with post-facet counting, but I think only on
fields whose value is constant within the group?  (Ie, because we pick
only the "head" doc, as long as the head doc is guaranteed to have the
same value for field X, it's safe to use that doc to represent the
entire group for facet counting).

Once docs within one can have different values for field X then we
need a different approach for counting their facets...


> Post grouping faceting
> ----------------------
>
>                 Key: LUCENE-3097
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3097
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: modules/grouping
>            Reporter: Martijn van Groningen
>            Assignee: Martijn van Groningen
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3097.patch
>
>
> This issues focuses on implementing post grouping faceting.
> * How to handle multivalued fields. What field value to show with the facet.
> * Where the facet counts should be based on
> ** Facet counts can be based on the normal documents. Ungrouped counts. 
> ** Facet counts can be based on the groups. Grouped counts.
> ** Facet counts can be based on the combination of group value and facet 
> value. Matrix counts.   
> And properly more implementation options.
> The first two methods are implemented in the SOLR-236 patch. For the first 
> option it calculates a DocSet based on the individual documents from the 
> query result. For the second option it calculates a DocSet for all the most 
> relevant documents of a group. Once the DocSet is computed the FacetComponent 
> and StatsComponent use one the DocSet to create facets and statistics.  
> This last one is a bit more complex. I think it is best explained with an 
> example. Lets say we search on travel offers:
> |||hotel||departure_airport||duration||
> |Hotel a|AMS|5
> |Hotel a|DUS|10
> |Hotel b|AMS|5
> |Hotel b|AMS|10
> If we group by hotel and have a facet for airport. Most end users expect 
> (according to my experience off course) the following airport facet:
> AMS: 2
> DUS: 1
> The above result can't be achieved by the first two methods. You either get 
> counts AMS:3 and DUS:1 or 1 for both airports.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-3097) Post grouping faceting

Reply via email to