[jira] [Commented] (LUCENE-4757) Cleanup FacetsAccumulator API path

Shai Erera (JIRA) Wed, 06 Feb 2013 22:23:16 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-4757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13573239#comment-13573239
 ]


Shai Erera commented on LUCENE-4757:
------------------------------------

There are few abstractions that were added (compared to CountingFC), but I 
don't see why they will slow things down:

* FacetsCollector.getFacetResult() delegates to FacetsAccumulator vs. 
CountingFC which does the work on its own. Still, that's one method call for 
computing all the results.

* FacetsAccumulator calls FacetsAggregator to aggregate (count) the facets. 
That's one API call, and CountingFC also calls a method to do that. One 
difference could be JIT - in CountingFC, the method that's called is private, 
while FacetsAggregator is an interface. Since we don't know when JIT kicks in 
and when not, we can't tell for sure that it's the cause. Still, I could make 
CountingFacetsAggregator and FastCountingFacetsAggregator final, if that may 
help. What do you think?

* FacetsAccumulator calls FacetResultsHandler to compute the top-K facets, 
where CountingFC computed on its own (but still used a PQ class). That too is 
one method call, but same like FacetsAggregator, may not be JIT-able. I made 
some improvements (patch to follow shortly) where Int/FloatFacetResultsHandler 
are made final and FRHandler now takes FacetArrays in its ctor, which means 
they can access final class members only. Perhaps that will help. It makes 
FacetResultsHandler stateful, but the code now anyway initializes a new 
instance per FacetRequest (which is a cheap instance).

The code now is more "structured" and clean. I can write a 
FastCountingFacetsAccumulator which will go back to doing aggregation + top-K 
computation on its own, privately. But that starts to get ugly ... not sure 
it's worth it for only 6%?
                
> Cleanup FacetsAccumulator API path
> ----------------------------------
>
>                 Key: LUCENE-4757
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4757
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/facet
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>         Attachments: LUCENE-4757.patch, LUCENE-4757.patch
>
>
> FacetsAccumulator and FacetRequest expose too many things to users, even when 
> they are not needed, e.g. complements and partitions. Also, Aggregator is 
> created per-FacetRequest, while in fact applied per category list. This is 
> confusing, because if you want to do two aggregations, e.g. count and 
> sum-score, you need to separate the two dimensions into two different 
> category lists at indexing time.
> It's not so easy to refactor everything in one go, since there's a lot of 
> code involved. So in this issue I will:
> * Remove complements from FacetRequest. It is only relevant to 
> CountFacetRequest anyway. In the future, it should be a special Accumulator.
> * Make FacetsAccumulator concrete class, and StandardFacetsAccumulator extend 
> it and handles all the stuff that's relevant to sampling, complements and 
> partitions. Gradually, these things will be migrated to the new API, and 
> hopefully StandardFacetsAccumulator will go away.
> * Aggregator is per-document. I could not break its API b/c some features 
> (e.g. complement) depend on it. So rather I created a new FacetsAggregator, 
> with a bulk, per-segment, API. So far migrated Counting and SumScore to that 
> API.
> ** In the new API, you need to override FacetsAccumulator to define an 
> Aggregator for use, the default is CountingFacetsAggregator.
> * Started to refactor FacetResultsHandler, which its API was guided by the 
> use of partitions. I added a simple {{compute(FacetArrays)}} to it, which by 
> default delegates to the nasty API, but overridden by specific classes. This 
> will get cleaned further along too.
> * FacetRequest has a .getValueOf() which resolves an ordinal to its value 
> (i.e. which of the two arrays to use). I added FacetRequest.FacetArraysSource 
> and specialize when they are INT or FLOAT, creating a special 
> FacetResultsHandler which does not go back to FR.getValueOf for every 
> ordinal. I think that we can migrate other FacetResultsHandlers to behave 
> like that ... at the expense of code duplication.
> ** I also added a TODO to get rid of getValueOf entirely .. will be done 
> separately.
> * Got rid of CountingFacetsCollector and StandardFacetsCollector in favor of 
> a single FacetsCollector which collects matching documents, and optionally 
> scores, per-segment. I wrote a migration class from these per-segment 
> MatchingDocs to ScoredDocIDs (which is global), so that the rest of the code 
> works, but the new code works w/ the optimized per-segment API. I hope 
> performance is still roughly the same w/ these changes too.
> There will be follow-on issues to migrate more features to the new API, and 
> more cleanups ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-4757) Cleanup FacetsAccumulator API path

Reply via email to