[
https://issues.apache.org/jira/browse/LUCENE-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13888682#comment-13888682
]
Shai Erera commented on LUCENE-5428:
------------------------------------
bq. Do you mean the method call overhead it would create?
Exactly! On LUCENE-5316 we tried to replace a direct array access call with a
method call which accesses the array internally, and the slowdowns were
significant. The purpose there, as is the purpose here, is to allow for a more
efficient representation of the arrays (whether compressed / map-based), but
losing 60% on some queries seemed too much. Also, we didn't see an improvement
on any of the changes. So ... it's kind of hard to justify such change in
general.
Basically, our MO is to make sure the abstraction doesn't hurt (much) the
current implementations (i.e. FacetArrays). If that's case, I'll +1 to add the
abstraction. With that behind us we're more free to explore whatever
representation we feel like for the aggregated values (e.g. map). But if the
abstraction itself loses like 10+%, that's a bad sign because at the end of the
day, most apps don't run extreme edge cases where they want to count 1-2
categories only, so they shouldn't suffer from a great slowdown. The expert
apps should be able to optimize their case, sometimes it unfortunately means
also duplicating code...
So let's first make sure how much do we lose by the abstraction itself. BTW, if
there's a better representation for the counts that overall improves
performance (with the abstraction), then that's of course a win/win!
> Make Faceting counting array overridable
> ----------------------------------------
>
> Key: LUCENE-5428
> URL: https://issues.apache.org/jira/browse/LUCENE-5428
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/facet
> Affects Versions: 4.6.1
> Reporter: John Wang
> Attachments: facetcounter.patch
>
>
> In SortedSetDocValuesFacetCounts, the count array is allocated as an int[]
> size of number of total values across all facets and that is allocated per
> query.
> In the case where number of values are large, large amount of garbage maybe
> created. Furthermore, the size of the array is dependent on the number of
> possible values, not number of number values needed for which facets fields
> are being accumulated for. E.g. if FacetSearchParam indicates counting only
> one 1 field with 2 values, we are still creating the array for all values
> across all fields.
> This patch makes the count array abstract to allow for
> 1) caching
> 2) hash counting - which can choose to count only of needed fields.
> This patch can be further enhanced to create FacetCouter per segment, per
> field by pass in the ordinal map.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]