[jira] [Commented] (LUCENE-5428) Make Faceting counting array overridable

Shai Erera (JIRA) Sat, 01 Feb 2014 11:26:29 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13888682#comment-13888682
 ]


Shai Erera commented on LUCENE-5428:
------------------------------------

bq. Do you mean the method call overhead it would create?

Exactly! On LUCENE-5316 we tried to replace a direct array access call with a 
method call which accesses the array internally, and the slowdowns were 
significant. The purpose there, as is the purpose here, is to allow for a more 
efficient representation of the arrays (whether compressed / map-based), but 
losing 60% on some queries seemed too much. Also, we didn't see an improvement 
on any of the changes. So ... it's kind of hard to justify such change in 
general.

Basically, our MO is to make sure the abstraction doesn't hurt (much) the 
current implementations (i.e. FacetArrays). If that's case, I'll +1 to add the 
abstraction. With that behind us we're more free to explore whatever 
representation we feel like for the aggregated values (e.g. map). But if the 
abstraction itself loses like 10+%, that's a bad sign because at the end of the 
day, most apps don't run extreme edge cases where they want to count 1-2 
categories only, so they shouldn't suffer from a great slowdown. The expert 
apps should be able to optimize their case, sometimes it unfortunately means 
also duplicating code...

So let's first make sure how much do we lose by the abstraction itself. BTW, if 
there's a better representation for the counts that overall improves 
performance (with the abstraction), then that's of course a win/win!

> Make Faceting counting array overridable
> ----------------------------------------
>
>                 Key: LUCENE-5428
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5428
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/facet
>    Affects Versions: 4.6.1
>            Reporter: John Wang
>         Attachments: facetcounter.patch
>
>
> In SortedSetDocValuesFacetCounts, the count array is allocated as an int[] 
> size of number of total values across all facets and that is allocated per 
> query.
> In the case where number of values are large, large amount of garbage maybe 
> created. Furthermore, the size of the array is dependent on the number of 
> possible values, not number of number values needed for which facets fields 
> are being accumulated for. E.g. if FacetSearchParam indicates counting only 
> one 1 field with 2 values, we are still creating the array for all values 
> across all fields.
> This patch makes the count array abstract to allow for
> 1) caching
> 2) hash counting - which can choose to count only of needed fields.
> This patch can be further enhanced to create FacetCouter per segment, per 
> field by pass in the ordinal map.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-5428) Make Faceting counting array overridable

Reply via email to