[ 
https://issues.apache.org/jira/browse/LUCENE-4769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13576144#comment-13576144
 ] 

Shai Erera commented on LUCENE-4769:
------------------------------------

FacetsAggregator is an abstraction of the facets package that lets you compute 
different functions on the aggregated ordinals. E.g. counting is equivalent to 
#sum(1), while SumScoreFacetsAggregator does #sum(score) etc.

You're right that this could be implemented as a Codec, and then we won't even 
need to alert the user that if he uses that caching method, he should use 
DiskValuesFormat. But it looks an awkward decision to me. Usually, caching does 
not force you to index stuff in a specific way. Rather, you decide at runtime 
if you want to cache the data or not. You can even choose to stop using the 
cache, while the app is running. Also, it's odd that if the app already indexed 
documents with the default Codec, it won't be able to using this caching 
method, unless it reindexes, or until those segments are merged (b/c their 
DVFormat will be different, and so the aggregator would need to revert to a 
different counting code).

I dunno ... it's certainly doable, but it doesn't feel right to me.
                
> Add a CountingFacetsAggregator which reads ordinals from a cache
> ----------------------------------------------------------------
>
>                 Key: LUCENE-4769
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4769
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/facet
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>         Attachments: LUCENE-4769.patch
>
>
> Mike wrote a prototype of a FacetsCollector which reads ordinals from a 
> CachedInts structure on LUCENE-4609. I ported it to the new facets API, as a 
> FacetsAggregator. I think we should offer users the means to use such a 
> cache, even if it consumes more RAM. Mike tests show that this cache consumed 
> x2 more RAM than if the DocValues were loaded into memory in their raw form. 
> Also, a PackedInts version of such cache took almost the same amount of RAM 
> as straight int[], but the gains were minor.
> I will post the patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to