Itamar, Have a look here: http://lucene.apache.org/java/2_3_1/scoring.html
Regards, Paul Elschot Op Tuesday 08 April 2008 00:34:48 schreef Itamar Syn-Hershko: > Paul and John, > > Thanks for your quick reply. > > The problem with query rewriting is the beforementioned > MaxClauseException. Instead of inflating the query and passing a > deterministic list of terms to the actual search routine, Lucene > could have accessed the vectors in the index using some sort of > filter. So, for example, if it knows to access "Foobar" by its name > in the index, why can't it take "Foo*" and just get all the vectors > until "Fop" is met (for example). Why does it have to get > deterministic list of terms? > > I will take a look at the Scorer - can you describe in short what > exactly it does and where and when it is being called? > > I don't get John's comment though - Query::rewrite is being called > prior to the actual searching (through QueryParser), how come it can > use "information gathered from IndexReader at search time"? > > Itamar. > > -----Original Message----- > From: Paul Elschot [mailto:[EMAIL PROTECTED] > Sent: Tuesday, April 08, 2008 12:57 AM > To: java-user@lucene.apache.org > Subject: Re: Why Lucene has to rewrite queries prior to actual > searching? > > Itamar, > > Query rewrite replaces wildcards with terms available from the index. > Usually that involves replacing a wildcard with a BooleanQuery that > is an effective OR over the available terms while using a flat > coordination factor, i.e. it does not matter how many of the > available terms actually match a document, as long as at least one > matches. > > For the required query parts (AND like), Scorer.skipTo() is used, and > that could well be the filter mechanism you are referring to; have a > look at the javadocs of Scorer, and, if necessary, at the actual code > of ConjunctionScorer. > > Regards, > Paul Elschot > > Op Monday 07 April 2008 23:13:09 schreef Itamar Syn-Hershko: > > Hi all, > > > > Can someone from the experts here explain why Lucene has to get a > > "rewritten" query for the Searcher - so Phrase or Wildcards queries > > have to rewrite themselves into a "primitive" query, that is then > > passed to Lucene to look for? I'm probably not familiar too much > > with the internals of Lucene, but I'd imagine that if you can > > inflate a query using wildcards via xxxxQuery sub classing, you > > could as easily (?) have some sort of Filter mechanism during the > > search, so that Lucene retrieves the Position vectors for all the > > terms that pass that filter, instead of retrieving only the > > position data for deterministic terms (with no wildcards etc.). If > > that was possible to do somehow, it could greatly increase the > > searchability of Lucene indices by using RegEx (without re-writing > > and getting the dreaded MaxClauseCount error) and similar. > > > > Would love to hear some insights on this one. > > > > Itamar. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]