On Thursday 26 January 2006 20:08, Chris Hostetter wrote: > > The subject of revamping the Filter API to support more compact filter > representations has come up in the past ... At least one patch comes to > mind that helps with the issue... > > https://issues.apache.org/jira/browse/LUCENE-328 > > ...i'm not intimitely familiar with that code, but if i recall correctly > from the last time i read it, it doesn't propose any actual API changes > just some utilities to reduce memory usage. > > Reading your post has me thinking about this whole issue again, > particularly the subject of Filters that are straight forward enough they > could be implimented as simple iterators with very little state and what > API changes could be made to support the interface you describe and still > be backwards compatible. > > One thing that comes to mind (that i don't remember suggesting before, but > perhaps someone else has suggested it before) is that since Filter is an > bastract class which people arecurrently required to subclass, we could > follow a migration path something like this... > > 1) add a SearchFilter interface like the one you describe to the core > code base > 2) add the following method declaration to the Filter class... > public SearchFilter getSearchFilter(IndexReader) throws IOException > ...impliment this method by calling bits, and returning an instance > of a thin inner class that wraps the BitSet
This is done in the FilteredQuery referred to above in the above reference. The wrapper might take a small performance hit. > 3) indicate that Filter.bits() is deprecated. > 4) change all existing calls to Filter.bits() in the core lucene code > base to call Filter.getSearchFilter and do whatever iterating is > neccessary. > 5) gradually reimpliment all of the concrete instances of Filter in > the core lucene code base so they override the getSearchFilter method > with something that returns a more "iterator" style SearchFilter, > and impliment their bits() method to use the SearchFilter to build up > the bit set if clients call it directly. > 6) wait a suitable amount of time. > 7) remove Filter.bits() and all of the concrete implimentations from the > lucene core. Sounds feasible to me, provided the performance hit is small enough. > ...i think that would be a fairly straight forward and practical way to > execute such a change. The big question in my mind is what the > "SearchFilter" interface should look like. what you propose is along the > usage lines of "iterate over your ScoreDocs, and foreach one test it > against hte filter" ... but i'm not convinced that it wouldnt' make more > sense to say "ask the filter what the next viable doc is, now score it", > ala... > > public interface SearchFilter { > /** returns doc ids that pass the filter, in increasing order. > * returns 0 once there are no more docs. > */ > int doc getNextFilteredDoc(); > } > > > thoughts? For search speed one needs to know the next filtered document, much like BitSet.nextSetBit(). See DocNrSkipper in the issue referred to above. Regards, Paul Elschot --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]