[
https://issues.apache.org/jira/browse/LUCENE-4769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13576418#comment-13576418
]
Shai Erera commented on LUCENE-4769:
------------------------------------
Ok .. I think I know where the confusion is, and it's mostly due my lack of
proper understanding of Codecs ..
We basically mean the same thing, only what you propose is more realistic w/
today's IndexReader API, which only exposes docValues. While what I had in mind
(taking a look again at notes I wrote few months ago) is that facets could have
a CompositeReader impl which adds facets specific API. Until then, we have no
other choice but to piggy-back on DV API, and that means extending DVFormat.
Thanks for insisting, it made me understand how this should work ... (sorry,
but I didn't write a Codec yet).
Perhaps separately we can think about an IndexReader impl for facets, which
will open the road to many different optimizations, e.g. maintaining a
per-segment taxonomy and top-level reader global-ordinal map (all in-memory),
encoding facet ordinals in their own structure (and not DV) and maybe even
managing the global taxonomy as part of the search index (through sidecar files
or something), w/o the sidecar index, which I think today is a barrier for apps
as well as integrating that into Solr or ES. But that should be done separately
as it's a major refactoring to how facets work.
Even FacetsDV are sort of a refactoring (i.e. replacing CategoryListIterator
with that .. if we want to do it right), so I think that for now I'm going to
still commit that cache as an aggregator and we can get rid of it once we do
FacetsDV.
Oh .. and there was one thing that bothered me in that statement:
bq. You seem hell-bent on the idea that lucene should have a getInts(docid,
IntsRef) api for facets
First, I'm not hell-bent on anything (don't even know what that means). Second,
facets are now a \*lucene\* module, and not private to me. From my perspective,
*lucene* doesn't need to have anything for me, but *lucene* should have the
best facets module. So far I've been busy refactoring facets so they work
faster and have cleaner API ... not to me, to *lucene* users. I'm sure things
can be simplified even further and improved even more. I think about it
constantly. If you have a better idea of how facets should work (while
maintaining current capabilities, as much as possible), I'm all open to
suggestions, really.
> Add a CountingFacetsAggregator which reads ordinals from a cache
> ----------------------------------------------------------------
>
> Key: LUCENE-4769
> URL: https://issues.apache.org/jira/browse/LUCENE-4769
> Project: Lucene - Core
> Issue Type: New Feature
> Components: modules/facet
> Reporter: Shai Erera
> Assignee: Shai Erera
> Attachments: LUCENE-4769.patch
>
>
> Mike wrote a prototype of a FacetsCollector which reads ordinals from a
> CachedInts structure on LUCENE-4609. I ported it to the new facets API, as a
> FacetsAggregator. I think we should offer users the means to use such a
> cache, even if it consumes more RAM. Mike tests show that this cache consumed
> x2 more RAM than if the DocValues were loaded into memory in their raw form.
> Also, a PackedInts version of such cache took almost the same amount of RAM
> as straight int[], but the gains were minor.
> I will post the patch shortly.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]