[
https://issues.apache.org/jira/browse/LUCENE-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13888449#comment-13888449
]
Shai Erera commented on LUCENE-5428:
------------------------------------
Mike and I discussed this in the past, but I cannot find the discussion now,
perhaps it was on Chat. The idea was the same as your patch - add an
abstraction layer to how you count facets (and BTW, not just for SortedSet, but
for the Taxonomy path too), because e.g. I'm working with a team which seems to
have the exact same problem like yours -- they have few million categories, yet
sometimes they need to count only 1 (of very few), yet have to incur the cost
of allocating the big FacetArrays.
The discussion happened in parallel to our attempts to abstract the taxonomy
arrays API, on LUCENE-5316. We were forced to back off from that idea though,
because faceted search insisted to slow down, to our disappointment.
For now, I advised the other team to write their own FacetsAggregator (Facets
in the new API). I'm all for exploring a FacetsCounter API abstraction here,
just noting that you have an option already, which is to implement your own
Facets (yes, and maybe duplicate code...).
> Make Faceting counting array overridable
> ----------------------------------------
>
> Key: LUCENE-5428
> URL: https://issues.apache.org/jira/browse/LUCENE-5428
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/facet
> Affects Versions: 4.6.1
> Reporter: John Wang
> Attachments: facetcounter.patch
>
>
> In SortedSetDocValuesFacetCounts, the count array is allocated as an int[]
> size of number of total values across all facets and that is allocated per
> query.
> In the case where number of values are large, large amount of garbage maybe
> created. Furthermore, the size of the array is dependent on the number of
> possible values, not number of number values needed for which facets fields
> are being accumulated for. E.g. if FacetSearchParam indicates counting only
> one 1 field with 2 values, we are still creating the array for all values
> across all fields.
> This patch makes the count array abstract to allow for
> 1) caching
> 2) hash counting - which can choose to count only of needed fields.
> This patch can be further enhanced to create FacetCouter per segment, per
> field by pass in the ordinal map.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]