> Actually I think the faceting module is per-segment? That would be very cool. I reviewed the user guide and it is ambiguous on this topic. Eg, why does the facet taxonomy need to be be committed for every IW commit? Mapping that to [N]RT will be tricky.
Page 17: "In faceted search, we complicate things somewhat by adding a second index – the taxonomy index. The taxonomy API also follows point-in-time semantics, but this is not quite enough. Some attention must be paid by the user to keep those two indexes consistently in sync:" "The main index refers to category numbers defined in the taxonomy index. Therefore, it is important that we open the TaxonomyReader after opening the IndexReader. Moreover, every time an IndexReader is reopen()ed, the TaxonomyReader needs to be refresh()1ed as well." But there is one extra caution: whenever the application deems it has written enough information worthy a commit, it must first call commit() for the TaxonomyWriter and only after that call commit() for the IndexWriter. Closing the indices should also be done in this order – first close the taxonomy, and only after that close the index." On Sat, Jul 9, 2011 at 4:13 AM, Michael McCandless <[email protected]> wrote: > Actually I think the faceting module is per-segment? > > The facets are encoded into payloads, and then it visits the payload > of each hit right per segment, and aggregates the counts. > > Like, on reopen (NRT or not) of a reader, there are no global data > structures that must be recomputed. EG, this facets impl doesn't use > FieldCache on the global reader (leading to insanity....). > > Mike McCandless > > http://blog.mikemccandless.com > > On Sat, Jul 9, 2011 at 12:40 AM, Shai Erera <[email protected]> wrote: >> Well, the approach is entirely different, and the new module >> introduces features not available in the other impls (and I imagine >> vice versa). >> >> The taxonomy is managed on the side, hence why it is global to the >> 'content' index. It plays very well with NRT, and we in fact have >> several apps that use the module in an NRT environment. >> >> The taxonomy index supports NRT by itself, by using the IR.open(IW) >> API and then it's up to the application to manage its content index >> search as NRT. >> >> I think you should read the high-level description I put on >> LUCENE-3079 and the userguide I put on LUCENE-3261. As I said, the >> approach is quite different than the bitset and FieldCache ones. >> >> Shai >> >> On Saturday, July 9, 2011, Jason Rutherglen <[email protected]> >> wrote: >>>> The taxonomy is global to the index, but I think it will be >>>> interesting to explore per-segment taxonomy, and how it can be used to >>>> improve indexing or search perf (hopefully both) >>> >>> Right so with NRT this'll be an issue. Is there a write up on this? >>> It sounds fairly radical in design. Eg, I'm curious as to how it >>> compares with the bit set and un-inverted field cache based faceting >>> systems. >>> >>> On Fri, Jul 8, 2011 at 8:44 PM, Shai Erera <[email protected]> wrote: >>>> Currently it doesn't facet per segment, because the approach it uses >>>> is irrelevant to per segment. >>>> >>>> It maintains a count array in the size of the taxonomy and every >>>> matching document contributes to the weight of the categories it is >>>> associated with, orregardless of the segment it is found in. >>>> >>>> The taxonomy is global to the index, but I think it will be >>>> interesting to explore per-segment taxonomy, and how it can be used to >>>> improve indexing or search perf (hopefully both). >>>> >>>> Shai >>>> >>>> On Saturday, July 9, 2011, Jason Rutherglen <[email protected]> >>>> wrote: >>>>> Is it faceting per-segment? >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: [email protected] >>>>> For additional commands, e-mail: [email protected] >>>>> >>>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: [email protected] >>>> For additional commands, e-mail: [email protected] >>>> >>>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [email protected] >>> For additional commands, e-mail: [email protected] >>> >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
