[jira] [Commented] (LUCENE-4625) Make TotalFacetCounts per-segment

Shai Erera (JIRA) Wed, 12 Dec 2012 20:17:25 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-4625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13530661#comment-13530661
 ]


Shai Erera commented on LUCENE-4625:
------------------------------------

Yeah ok, I think you're right.

One thing that I forgot to write here yesterday (it only occurred to me before 
I fell asleep), is that TotalFacetCounts are *global* to the index. What I 
would like to do here is to compute PerSegmentTotalCounts and then have TFC 
aggregate all of these into one TFC array.

The question is how to make it work smoothly, w/o the application needing to do 
a lot? Well, today the app needs to recompute TFC for any top-level reader. By 
computing I mean that it needs to obtain the TFC for a reader, and that 
computes things under the covers if that's a new reader.

Moving to per-segment TFC would mean that TFC would now not load the entire 
data from disk again, but rather compute the TFCs for the new segments only and 
return a fresh new TFC to the app. So something to start with:

* TFC will hold a WeakHM for AtomicTFC (or just TFC, need to see how it goes)
* When app gives a top-level reader to TFC, it iterates on leaves(), pulls from 
the cache the AtomicTFCs and computes new ATFCs for new segments and puts them 
in WHM.
* It then computes the global TFC from the in-memory structure and returns that 
to the app. It must re-compute, because some segments may not exist anymore, 
and therefore it cannot just add the 'diff' from the new ATFCs.

The downside of that is that each segment holds a counting array at the size of 
the taxonomy. We can explore an alternative format to an array. E.g. small 
segments (NRT) will likely contain very few facet ordinals, and so two small 
parallel int[] (one denotes the ord, the other its count) would be better? 
Let's think about that too.
                
> Make TotalFacetCounts per-segment
> ---------------------------------
>
>                 Key: LUCENE-4625
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4625
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/facet
>            Reporter: Shai Erera
>
> TotalFacetCounts are used during complements computation today. They are not 
> per-segment and therefore are not NRT friendly. Even regardless to NRT, you 
> need to compute them entirely from scratch whenever you reopen IR.
> It would be good if we can develop them per-segment. If e.g. AtomicReader had 
> a notion of cachable objects, it could be such an object. That has been 
> discussed many times in the past though, without a consensus. So perhaps we 
> can have a FacetsAtomicReader which manages TFC. But that creates other 
> issues too, like who instantiates that AtomicReader (i.e. we'd need a 
> FacetsCompositeReader too, and potentially IW would need to init that type) 
> ...
> Let's explore these options, but in general it would be good to have TFC 
> per-segment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-4625) Make TotalFacetCounts per-segment

Reply via email to