On Friday 10 August 2007 19:21, mark harwood wrote: > >>I'll change the CachingWrapperFilter to use a BitSetFilter, > >>which data structure would you like to have cached for filtering > >>in the xml query parser? > > I think Filter/BitSetFilter might be the wrong choice of object to cache > in there. It is the *data that Filters create* which is most important > to cache (currently limited to a BitSet). This data object is what I > chose to call a DocIdSet and implementations could be a BitSet, > OpenBitSet or SortedVint structure in your new scheme.
I'll start with a BitSet initially, when nobody beats me to it. > When using/reusing a DocIdSet (either as part of filtering or > perhaps counting how search results have fallen into various > categories) I proposed that client code should call > DocIdSet.getIterator() to get hold of an iterator for their > own one-time-only use in iterating across that set. Right, the only thing left is then how to get a Matcher from this iterator. > The Filters (a service I chose to call more generically > "DocIdSetFactory") also need to be cached but only to be used > as keys for equivalence checks (i.e. on a cached Filter only the > methods hashcode/equals are called). The rationale for this is > that the cached Filter is the object best placed to answer the question > "is this new incoming Filter request the same as one I have already > processed?". If the cached Filter matches the incoming request (i.e. > the criteria is the same) then you can look for a cached DocIdSet held > in a WeakHashMap keyed on IndexReader. If you have a cache hit then > you call cachedDocIdSet.getIterator(), otherwise take the cost of calling > newFilter.getDocIdSet(reader) and cache that result. > > This is effectively how the remote FilterManager and the XMLQueryParser > filter caching stuff work with Filters/Bitsets today. > > Hope this makes sense. It does make sense. I suppose the Filter criterium is a Lucene Query? Anyway, I'll let it sink in over my holidays :) Regards, Paul Elschot > > > > > ----- Original Message ---- > From: Paul Elschot <[EMAIL PROTECTED]> > To: java-dev@lucene.apache.org > Sent: Friday, 10 August, 2007 5:31:02 PM > Subject: Re: Fwd: Decouple Filter from BitSet: API change and xml query parser > > > > On Friday 10 August 2007 13:12, mark harwood wrote: > > >>Could someone give me a clue as to why the test case > TestRemoteCachingWrapperFilter fails with the patch applied? > > > > Regardless of the reasons for this particular test failure, this code is not > safe in other ways which the test cases don't test for. > > > > To restate the issue: Matcher is not designed to be threadsafe > > A Matcher is almost a Scorer, the only difference is that it does not > have a score() method. Scorers are not threadsafe, they are used > once during a query search. The intention is to use Matchers > in the same way: once during a query search in case no score value > is needed. > > > and CachingWrapperFilter (or any other example of existing > > caching strategies) cannot therefore simply be changed to > > cache Matchers in place of the existing scheme of caching bitsets > > (which are currently used in a thread-safe manner by all Lucene code). > > Bitsets don't offer the notion of a cursor (required for "next" > > methods) while Matcher does which spoils it's potential for > > reuse/shared use. > > The idea is not to cache the Matchers, but the underlying data structure. > > > The remoting test code you refer to uses your modified > > CachingWrapperFilter which has swapped Matchers for BitSets > > and so I would anticipate thread safety issues if the tests actually > > tried to share/reuse the same Matcher. > > Thanks for taking a look at the code. > I'll change the CachingWrapperFilter to use a BitSetFilter, > and then hopefully more test cases will pass. > > > >>Finally, are DocIdSet and DocIdSetIterator currently part of Lucene? I > don't know how to go about these. > > > > These are two of the names I gave to a notional set of 3 services that I > outlined here: > > > > https://issues.apache.org/jira/browse/LUCENE-584#action_12518642 > > > > I introduced this terminology to the discussion because: > > 1) It describes 2 services that are currently combined in Matcher > > that I feel need to be separated > > The idea of Matcher is that it is a Scorer without a score() method, > and no more. > > > 2) It uses a more generic description of the services offered that can be > useful when considering other applications of the services (e.g. category > count and filtering logic both can use cached sets of doc IDs. DocIdSet > seemed to describe the service more generically than "Matcher") > > > > I'm happy to drop use of these terms from this discussion if you > > feel they are not useful. > > I think that DocIdSet has the role of the underlying data structure that > would be cached, and that DocIdSetIterator is something very close > to Matcher or even the same thing. > > Which brings me to another question: which data structure would > you like to have cached for filtering in the xml query parser? > I think initially BitSet would do nicely, but one could also take > the opportunity to use more compact data structures when possible. > > > Finally one of the examples classes I gave is incomplete, see below. > I wrote: > > > ... > > As for the API change, how to move from the current: > > > > public class Filter { > > abstract public BitSet bits(IndexReader); > > } > > > > to: > > > > public class Filter { > > abstract public Matcher getMatcher(IndexReader); > > } > > > > The patch proposes to do this by moving all current use of Filter to > > BitSetFilter: > > > > public class BitSetFilter extends Filter { > > abstract public BitSet bits(IndexReader); > > // BitSetFilter also has: > > public Matcher getMatcher(IndexReader reader) { > return DefaultMatcher.defaultMatcher(bits()); > } > > > } > > Regards, > Paul Elschot > > > > > > Would it be good to have an intermediate version of Filter like this > > one: > > > > public class Filter { > > /** deprecated, use class BitSetFilter instead */ > > public BitSet bits(IndexReader); {return null;} > > abstract public Matcher getMatcher(IndexReader); > > } > > > > > ... > > > > > > Regards, > > Paul Elschot > > > > > > > > > > > > > > > > > > ---------- Forwarded Message ---------- > > > > Subject: [jira] Commented: (LUCENE-584) Decouple Filter from BitSet > > Date: Friday 10 August 2007 01:15 > > From: "Mark Harwood (JIRA)" <[EMAIL PROTECTED]> > > To: java-dev@lucene.apache.org > > > > > > > > > [ https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12518868 ] > > > > Mark Harwood commented on LUCENE-584: > > ------------------------------------- > > > > OK, I appreciate caching may not be a top priority in this proposal but I > have > > live systems in production using XMLQueryParser and which use the existing > > core facilities for caching. As it stands this proposal breaks this > > functionality (see "FIXME" in contrib's CachedFilterBuilder and my concerns > > over use of unthreadsafe Matcher in the core class CachingWrapperFilter) > > > > I am obviously concerned by this and keen to help shape a solution which > > preserves the existing capabilities while adding your new functionality. I'm > > not sure I share your view that support for caching can be treated as a > > separate issue to be dealt with at a later date. There are a larger number > of > > changes proposed in this patch and if the design does not at least consider > > future caching issues now, I suspect much will have to be reworked later. > The > > change I can envisage most clearly is expressed in my concern that the > > DocIdSet and DocIdSetIterator services I outlined are being combined in > > Matcher as it stands now and these functions will have to be separated. > > > > Cheers > > Mark > > > ... > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > ___________________________________________________________ > Yahoo! Answers - Got a question? Someone out there knows the answer. Try it > now. > http://uk.answers.yahoo.com/ > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]