On Fri, Mar 26, 2010 at 1:17 AM, Daniel Noll <dan...@nuix.com> wrote: > On Thu, Mar 25, 2010 at 21:41, Michael McCandless > <luc...@mikemccandless.com> wrote: >> >> This depends on the particulars of filter... but in general you >> shouldn't have to consume more RAM, I think? Ie you should be able to >> do your computation against the top-level reader, and then store the >> results of your computation per-sub-reader. > > I am having issues figuring out how to get a reference to the > top-level reader. The API passes them in one by one and I can't see a > way to find the top-level reader for one which was passed in. I can't > easily cheat and pass the top-level one into the Filter constructor, > because filters are serialisable and that kind of thing won't survive > serialisation.
Probably, for now at least, you'll have to park a reference to the current IndexReader in some known, static place, where your filter checks? > To throw an additional spanner in the works, the behaviour I need is > that only the *last* document should be returned. So even if a > certain document matches the filter after N readers have been passed > in, it might not match the filter after N+1 readers have been passed > in. Essentially I need a method like... > > DocIdSet[] getDocIdSets(IndexReader[] readers); > > And where the readers are guaranteed to be in order of docBase. This is a doozie of a filter :) > By the way, I notice that the order the readers are passed to the > method is essentially undocumented. The test code appears to be > assuming they will be passed in the natural order of the documents > (which is logical) but couldn't a future change parallelise segment > searches for performance reasons, thus reordering the calls? Yeah the order is intentionally not defined, to allow for possible future optimizations like this. It is in-docBase-order today, but won't necessarily always be that way... in fact during 2.9 development there was a brief time when it was in a different order. > It would > be nice if the API would explicitly pass the docBase for the > IndexReader - this would reduce the need to perform maths to determine > the docBase ourselves, and also make it possible to parallelise those > calls later. Maybe we should do that... or maybe make a new class, containing the toplevel reader, sub reader, and docBase? Something like that? I think there may be an issue already open for this... I agree this API is problematic for context-sensitive filters (filters that need to see all segments in the index in order to decide how to compute the current segment's DISI). Most of Lucene's filters are context-free and so this API was created with that use-case in mind. Yours is not the only context-sensitive filter out there... Mike --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org