[jira] [Commented] (LUCENE-4769) Add a CountingFacetsAggregator which reads ordinals from a cache

Shai Erera (JIRA) Mon, 11 Feb 2013 22:35:15 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-4769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13576418#comment-13576418
 ]


Shai Erera commented on LUCENE-4769:
------------------------------------

Ok .. I think I know where the confusion is, and it's mostly due my lack of 
proper understanding of Codecs ..

We basically mean the same thing, only what you propose is more realistic w/ 
today's IndexReader API, which only exposes docValues. While what I had in mind 
(taking a look again at notes I wrote few months ago) is that facets could have 
a CompositeReader impl which adds facets specific API. Until then, we have no 
other choice but to piggy-back on DV API, and that means extending DVFormat. 
Thanks for insisting, it made me understand how this should work ... (sorry, 
but I didn't write a Codec yet).

Perhaps separately we can think about an IndexReader impl for facets, which 
will open the road to many different optimizations, e.g. maintaining a 
per-segment taxonomy and top-level reader global-ordinal map (all in-memory), 
encoding facet ordinals in their own structure (and not DV) and maybe even 
managing the global taxonomy as part of the search index (through sidecar files 
or something), w/o the sidecar index, which I think today is a barrier for apps 
as well as integrating that into Solr or ES. But that should be done separately 
as it's a major refactoring to how facets work.

Even FacetsDV are sort of a refactoring (i.e. replacing CategoryListIterator 
with that .. if we want to do it right), so I think that for now I'm going to 
still commit that cache as an aggregator and we can get rid of it once we do 
FacetsDV.

Oh .. and there was one thing that bothered me in that statement:

bq. You seem hell-bent on the idea that lucene should have a getInts(docid, 
IntsRef) api for facets

First, I'm not hell-bent on anything (don't even know what that means). Second, 
facets are now a \*lucene\* module, and not private to me. From my perspective, 
*lucene* doesn't need to have anything for me, but *lucene* should have the 
best facets module. So far I've been busy refactoring facets so they work 
faster and have cleaner API ... not to me, to *lucene* users. I'm sure things 
can be simplified even further and improved even more. I think about it 
constantly. If you have a better idea of how facets should work (while 
maintaining current capabilities, as much as possible), I'm all open to 
suggestions, really.
                
> Add a CountingFacetsAggregator which reads ordinals from a cache
> ----------------------------------------------------------------
>
>                 Key: LUCENE-4769
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4769
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/facet
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>         Attachments: LUCENE-4769.patch
>
>
> Mike wrote a prototype of a FacetsCollector which reads ordinals from a 
> CachedInts structure on LUCENE-4609. I ported it to the new facets API, as a 
> FacetsAggregator. I think we should offer users the means to use such a 
> cache, even if it consumes more RAM. Mike tests show that this cache consumed 
> x2 more RAM than if the DocValues were loaded into memory in their raw form. 
> Also, a PackedInts version of such cache took almost the same amount of RAM 
> as straight int[], but the gains were minor.
> I will post the patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-4769) Add a CountingFacetsAggregator which reads ordinals from a cache

Reply via email to