Do you mean tracking the "atomic queries" that caused a given hit to match (where "atomic query" is a query that actually uses TermDocs/Positions to check matching, vs other queries like BooleanQuery that "glomm together" sub-query matches)?
EG for a boolean query w/ N clauses, which of those N clauses matched? This has been discussed/requested several times on java-user, and I think it makes alot of sense. A natural place to do this is Scorer API, ie extend it with a "getMatchingAtomicQueries" or some such. Probably, for efficiency, each Query should be pre-assigned an int position, and then the matching is represented as a bit array, reused across matches. Your collector could then ask the scorer for these bits if it wanted. There should be no performance cost for collectors that don't use this functionality. We've also discussed (under LUCENE-1522) similar extensions to Scorer API to get exact positions contributing to a match, and possibly using such an API to merge in Span{Term,And,Or}Query to their "normal" counterparts. But we should do this separately from LUCENE-1575 -- the java ghosts there are already challenging enough! Mike On Mon, Apr 6, 2009 at 11:57 PM, Shai Erera <ser...@gmail.com> wrote: > Hi Karl, > > LUCENE-1575 refactors HitCollector by seperating the score from document > collection. So if we were to introduce this type of method (that you > suggest), it would be through a setQueries(Collection<Query>) method. > > Maybe you try to verify if your use case makes sense, is efficient etc., > before we do this change. Adding a setQueries to Collector (the new name of > HC) shouldn't be a problem since we can always add an empty-impl method, not > affecting back-compat. However I wonder from where will it be called, > whether it makes sense to create that Collection object, pass it around > while knowing that most collectors will not use it? > > Is it something that you perhaps can implement by extending Collector (and > some other classes), and in your extending code call to setQueries? Today, > as far as I remember, only Scorer calls collect() and I'm not sure if Scorer > has the information of the matching queries. Even if it does, extending it > and calling setQueries from the extension seems more reasonable, than adding > such call to every query execution, which also means instantiating a new > Collection<Query> for every search (unless we provide an API on > IndexSearcher which allows you to pass such object). > > What do you think? > > On Tue, Apr 7, 2009 at 3:21 AM, Karl Wettin <karl.wet...@gmail.com> wrote: >> >> How crazy would it be to refactor HitCollector so it also accept the >> matching queries? >> >> Let's ignore my use case (not sure it makes sense yet, it's related to >> finding a threadshold between probably interesting and definitly not >> interesting results of huge OR-statements, but I really have to try it out >> before I can say if it's any good) and just focus on the speed impact. If I >> cleared and reused the Collection passed down to the HitCollector then it >> shouldn't really slow things down, right? And if I reused the collections in >> my TopDocsCollector as low scoring results was pushed down then it shouldn't >> have to be expensive there either. Or? >> >> >> karl >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-dev-h...@lucene.apache.org >> > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org