[ 
https://issues.apache.org/jira/browse/LUCENE-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13888449#comment-13888449
 ] 

Shai Erera commented on LUCENE-5428:
------------------------------------

Mike and I discussed this in the past, but I cannot find the discussion now, 
perhaps it was on Chat. The idea was the same as your patch - add an 
abstraction layer to how you count facets (and BTW, not just for SortedSet, but 
for the Taxonomy path too), because e.g. I'm working with a team which seems to 
have the exact same problem like yours -- they have few million categories, yet 
sometimes they need to count only 1 (of very few), yet have to incur the cost 
of allocating the big FacetArrays.

The discussion happened in parallel to our attempts to abstract the taxonomy 
arrays API, on LUCENE-5316. We were forced to back off from that idea though, 
because faceted search insisted to slow down, to our disappointment.

For now, I advised the other team to write their own FacetsAggregator (Facets 
in the new API). I'm all for exploring a FacetsCounter API abstraction here, 
just noting that you have an option already, which is to implement your own 
Facets (yes, and maybe duplicate code...).

> Make Faceting counting array overridable
> ----------------------------------------
>
>                 Key: LUCENE-5428
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5428
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/facet
>    Affects Versions: 4.6.1
>            Reporter: John Wang
>         Attachments: facetcounter.patch
>
>
> In SortedSetDocValuesFacetCounts, the count array is allocated as an int[] 
> size of number of total values across all facets and that is allocated per 
> query.
> In the case where number of values are large, large amount of garbage maybe 
> created. Furthermore, the size of the array is dependent on the number of 
> possible values, not number of number values needed for which facets fields 
> are being accumulated for. E.g. if FacetSearchParam indicates counting only 
> one 1 field with 2 values, we are still creating the array for all values 
> across all fields.
> This patch makes the count array abstract to allow for
> 1) caching
> 2) hash counting - which can choose to count only of needed fields.
> This patch can be further enhanced to create FacetCouter per segment, per 
> field by pass in the ordinal map.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to