Hi Paul,

QueryParser and MTQ's rewrite method have nothing to do with each other. The 
rewrite method is (explained as simple as possible) a class that is responsible 
to "rewrite" a MultiTermQuery to another query type (generally a query that 
allows to add "Term" instances, e.g. BooleanQuery of TermQuery or 
DisjunctionMaxQuery of Terms). The rewrite method takes the "filtered" terms 
enum provided by the query and creates a combined query out of it. Lucene ships 
with some already implemented rewrite methods based on abstract classes that 
handle the most common cases:

- ScoringRewrite handles the case where you want to collect the terms from the 
termsenum and place them as "clauses" in a top level query (e.g. a scoring 
BooleanQuery). You have to implement 2 abstract methods that produce the 
top-level query and create the clauses, that can be added to the top-level 
query. This class is generic to the top-level query, as the clauses can only be 
added to the correct top-level query. To make this work without casting, all 
methods are redefined to take the generics classes. So addClause() takes the 
generic top level query and a term. The rewrite method by itself returns the 
top level query
- TopTermsRewrite is similar, but has a major difference: It has almost same 
API, but the internal implementation of this class is different: It never hits 
the Boolean Max Clause Count, because the collected terms are ordered in a 
priority queue and only the top-ranking terms are added to the resulting 
top-level query. This class is also generified against the top-level query. 
Rewrite returns an instance of the top-level query.
- The very base class MultiTermQuery.RewriteMethod is most flexible but has no 
concrete implementation. It is used to rewrite a MTQ to a query that is not a 
composite top-level one with a number of terms, e.g. a filter that’s handled in 
a totally different stage of rewriting.

You can use the same MTQ rewrite for different MTQ types, e.g. you can rewrite 
a FuzzyQuery to a simple ConstantScore Query or a DisjunctionMaxQuery - but 
only the second one makes sense. On the other hand it makes no sense to rewrite 
Prefix and Wildcard using TopTermsRewrite, as those queries have terms enums 
withouth term boosts (only Fuzzy assigns a boost to every term depending on 
levensthein distance).

Things to note:
A rewrite method in MTQ would never rewrite to another MTQ like PrefixQuery - 
it could do this, but only in the lowest base class (see above)! -> If you rely 
on that, your code has a major problem. In that case the correct behavior would 
be to create a completely "own"oal.search.Query (that not extends MTQ) and 
implement a standard rewrite logic. This query could of course rewrite to MTQ's 
like Fuzzy or Prefix. IndexSearcher rewrites the query until it is completely 
rewritten, so your custom query would create a PrefixQuery which itself 
rewrites to something else.

QueryParser is just a factory for queries, its not related to MTQ. It only has 
an option to set a "default" method for common queries. But as you have a 
custom QueryParser, you can return the queries, configured like you want, to 
the caller.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -----Original Message-----
> From: Paul Taylor [mailto:paul_t...@fastmail.fm]
> Sent: Wednesday, February 27, 2013 10:53 AM
> To: java-user@lucene.apache.org
> Cc: Uwe Schindler
> Subject: Re: Uable to extends TopTermsRewrite in Lucene 4.1
> 
> On 26/02/2013 18:01, Paul Taylor wrote:
> > On 26/02/2013 17:22, Uwe Schindler wrote:
> >>> Hi,
> >>>
> >>> You cannot override rewrite() because you could easily break the
> >>> logic behind TopTermsRewrite. If you want another behavior, subclass
> >>> another base class and wrap the TopTermsRewrite instead of
> >>> subclassing it (the generics also enforce that the rewrite needs to
> >>> rewrite() to a class that’s specified in the generics parameter).
> >>>
> >>> addClause() is not final, its abstract. There is one "final" helper
> >>> method used by the rewrite itself, but the methods you need to
> >>> override are abstract.
> >>>
> >>> Also your generics seem to be wrong, leading to the above question...
> >> In addition, you cast the call to super.rewrite() to DisjMaxQuery, so
> >> it is definitely a DisjMaxQuery (because getTopLevelQuery() always
> >> returns one, see generics). You then pass this DisjMaxQuery to this
> >> "getQueryBoostMethod", which checks for instanceof PrefixQuery. This
> >> can never return true, so the boost is always 1. You can therefore
> >> nuke the whole rewrite method (as it changes nothing) and only
> >> implement getToplevelQuery() and addClause().
> >>
> >> Uwe
> Not making much sense of this, Im trying to use the same rewritemethod for
> 
> QueryParser
> 
> and
> 
> FuzzyQuery
> PrefixQuery
> 
> I'm confused as to whether I should be applying at both stages, and what the
> generic parameter should be as the javadoc for QueryParser.
> setMultiTermRewriteMethod() implies you need to change this to use
> different rewrite for fuzzy and prefix queries but you seem to be saying I
> should be using FuzzyQuery as the generic type whihc would prevent this
> wouldn't it ?
> 
> Is there a fuller explanation of rewrite methods anywhere ?
> 
> Full class below if it makes things clearer
> 
> Paul
> 
> package org.musicbrainz.search.servlet;
> 
> import org.apache.lucene.index.IndexReader;
> import org.apache.lucene.index.Term;
> import org.apache.lucene.queryparser.classic.ParseException;
> import org.apache.lucene.queryparser.classic.QueryParser;
> import org.apache.lucene.search.*;
> import org.apache.lucene.search.similarities.DefaultSimilarity;
> import org.apache.lucene.search.similarities.Similarity;
> import org.apache.lucene.search.similarities.TFIDFSimilarity;
> import org.musicbrainz.search.LuceneVersion;
> 
> import java.io.IOException;
> import java.util.HashMap;
> import java.util.Iterator;
> import java.util.Map;
> 
> public class DismaxQueryParser {
> 
>      public static String IMPOSSIBLE_FIELD_NAME = "\uFFFC\uFFFC\uFFFC";
>      protected DisjunctionQueryParser dqp;
> 
>      protected DismaxQueryParser() {
>      }
> 
>      public DismaxQueryParser(org.apache.lucene.analysis.Analyzer
> analyzer) {
>          dqp = new DisjunctionQueryParser(IMPOSSIBLE_FIELD_NAME,
> analyzer);
>          //TODO FIXME
>          //dqp.setMultiTermRewriteMethod(new
> MultiTermUseIdfOfSearchTerm(100));
>      }
> 
>      /**
>       * Create query consists of disjunction queries for each term fields 
> combo,
> and then
>       * a phrase search for each field as long as the original query is more 
> than
> one term
>       *
>       * @param query
>       * @return
>       *
>       */
>      public Query parse(String query) throws
> org.apache.lucene.queryparser.classic.ParseException {
> 
>          Query term = dqp.parse(DismaxQueryParser.IMPOSSIBLE_FIELD_NAME
> + ":(" + query + ")");
>          Query phrase =
> dqp.parse(DismaxQueryParser.IMPOSSIBLE_FIELD_NAME + ":\"" + query +
> "\"");
>          return buildTopQuery(term, phrase);
>      }
> 
>      /**
>       * If a phrase query was built then we create a boolean query that 
> requires
> something to match in
>       * the term query, under normal circumstances if nothing matches the
> term query nothing will match the phrase
>       * query
>       *
>       * @param term
>       * @param phrase
>       * @return
>       */
>      protected Query buildTopQuery(Query term, Query phrase) {
>          if (phrase instanceof DisjunctionMaxQuery) {
>              BooleanQuery bq = new BooleanQuery(true);
>              bq.add(term, BooleanClause.Occur.MUST);
>              bq.add(phrase, BooleanClause.Occur.SHOULD);
>              return bq;
>          } else {
>              return term;
>          }
>      }
> 
> 
>      public void addAlias(String field, DismaxAlias dismaxAlias) {
>          dqp.addAlias(field, dismaxAlias);
>      }
> 
>      static class DisjunctionQueryParser extends QueryParser {
> 
>          //Only make search terms that are this length fuzzy searchable and 
> only
> match to terms that are also this length
>          protected static final int MIN_FIELD_LENGTH_TO_MAKE_FUZZY = 4;
>          protected static final float FUZZY_SIMILARITY = 0.5f;
> 
>          //Reduce boost of wildcard/fuzzy matches compared to exact matches
>          protected static final float WILDCARD_BOOST_REDUCER = 0.8f;
> 
>          //Reduce phrase query scores otherwise there is too much difference
> between a document that matches on
>          //phrase and one that doesn't quite.
>          protected static final float PHRASE_BOOST_REDUCER = 0.2f;
> 
> 
>          public DisjunctionQueryParser(String defaultField,
> org.apache.lucene.analysis.Analyzer analyzer) {
>              super(LuceneVersion.LUCENE_VERSION, defaultField, analyzer);
>          }
> 
>          protected Map<String, DismaxAlias> aliases = new
> HashMap<String, DismaxAlias>(3);
> 
>          //Field to DismaxAlias
>          public void addAlias(String field, DismaxAlias dismaxAlias) {
>              aliases.put(field, dismaxAlias);
>          }
> 
>          // TODO FIXME _ Unable to create rewrite using original idf
>          // Rewrite Method used by Prefix Search and Fuzzy Search, use
> idf of the original term
>          //MultiTermQuery.RewriteMethod
> fuzzyAndPrefixQueryRewriteMethod
>          //        = new MultiTermUseIdfOfSearchTerm(200);
> 
>          protected boolean checkQuery(DisjunctionMaxQuery q, Query
> querySub, boolean quoted, DismaxAlias a, String f) {
>              if (querySub != null) {
>                  //if query was quoted but doesn't generate a phrase
> query we reject it
>                  if ((!quoted) || (querySub instanceof PhraseQuery)) {
>                      //Reduce phrase because will have matched both
> parts giving far too much score differential
>                      if (quoted) {
>                          querySub.setBoost(PHRASE_BOOST_REDUCER);
>                      } else {
> querySub.setBoost(a.getFields().get(f).getBoost());
>                      }
>                      q.add(querySub);
>                      return true;
>                  }
>              }
>              return false;
>          }
> 
>          @Override
>          //TODO FIXME was using a FLOAT similarity value of 0.5 but now
> chnaged to integral
>          protected Query getFuzzyQuery(String field, String termStr,
> float minSimilarity) {
>              Term t = new Term(field, termStr);
>              FuzzyQuery fq = new FuzzyQuery(t,  2,
> MIN_FIELD_LENGTH_TO_MAKE_FUZZY);
>              //TODO FIXME
>              //fq.setRewriteMethod(fuzzyAndPrefixQueryRewriteMethod);
>              return fq;
>          }
> 
> 
>          protected Query getFieldQuery(String field, String queryText,
> boolean quoted)
>                  throws ParseException
>          {
>              //If field is an alias
>              if (aliases.containsKey(field)) {
> 
>                  DismaxAlias a = aliases.get(field);
>                  DisjunctionMaxQuery q = new
> DisjunctionMaxQuery(a.getTie());
>                  boolean ok = false;
> 
>                  for (String f : a.getFields().keySet()) {
> 
>                      //if query can be created for this field and text
>                      Query querySub;
>                      Query queryWildcard = null;
>                      Query queryFuzzy = null;
> 
>                      DismaxAlias.AliasField af = a.getFields().get(f);
>                      if (!quoted && queryText.length() >=
> MIN_FIELD_LENGTH_TO_MAKE_FUZZY) {
>                          querySub = getFieldQuery(f, queryText, quoted);
>                          if (querySub instanceof TermQuery) {
> 
>                              if (af.isFuzzy()) {
>                                  Term t = ((TermQuery) querySub).getTerm();
>                                  queryWildcard = newPrefixQuery(new
> Term(t.field(), t.text()));
>                                  queryFuzzy = getFuzzyQuery(t.field(),
> t.text(), FUZZY_SIMILARITY);
>                                  queryFuzzy.setBoost(af.getBoost() *
> WILDCARD_BOOST_REDUCER);
>                                  q.add(queryFuzzy);
>                                  queryWildcard.setBoost(af.getBoost() *
> WILDCARD_BOOST_REDUCER);
>                                  q.add(queryWildcard);
>                              }
>                          }
>                      } else {
>                          querySub = getFieldQuery(f, queryText, quoted);
>                      }
> 
>                      if (checkQuery(q, querySub, quoted, a, f) && ok ==
> false) {
>                          ok = true;
>                      }
>                  }
>                  //Something has been added to disjunction query
>                  return ok ? q : null;
> 
>              } else {
>                  //usual Field
>                  try {
>                      return super.getFieldQuery(field, queryText, quoted);
>                  } catch (Exception e) {
>                      return null;
>                  }
>              }
>          }
> 
>          /**
>           * Builds a new PrefixQuery instance
>           * @param prefix Prefix term
>           * @return new PrefixQuery instance
>           */
>          protected Query newPrefixQuery(Term prefix){
>              PrefixQuery query = new PrefixQuery(prefix);
>              //TODO FIXME
> //query.setRewriteMethod(fuzzyAndPrefixQueryRewriteMethod);
>              return query;
>          }
>      }
> 
>      /*
>      TODO FIXME WAS Overriding methods that are now final
>      public static class MultiTermUseIdfOfSearchTerm<Q extends
> DisjunctionMaxQuery> extends TopTermsRewrite<Query> {
> 
>      //public static final class MultiTermUseIdfOfSearchTerm extends
> TopTermsRewrite<BooleanQuery> {
>          private final TFIDFSimilarity similarity;
> 
>          public MultiTermUseIdfOfSearchTerm(int size) {
>              super(size);
>              this.similarity = new DefaultSimilarity();
> 
>          }
> 
>          @Override
>          protected int getMaxSize() {
>              return BooleanQuery.getMaxClauseCount();
>          }
> 
>          @Override
>          protected DisjunctionMaxQuery getTopLevelQuery() {
>              return new DisjunctionMaxQuery(0.1f);
>          }
> 
>          @Override
>          protected void addClause(Query topLevel, Term term, float boost) {
>              final Query tq = new ConstantScoreQuery(new TermQuery(term));
>              tq.setBoost(boost);
>              ((DisjunctionMaxQuery)topLevel).add(tq);
>          }
> 
>          protected float getQueryBoost(final IndexReader reader, final
> MultiTermQuery query)
>                  throws IOException {
>              float idf = 1f;
>              float df;
>              if (query instanceof PrefixQuery)
>              {
>                  PrefixQuery fq = (PrefixQuery) query;
>                  df = reader.docFreq(fq.getPrefix());
>                  if(df>=1)
>                  {
>                      //Same as idf value for search term, 0.5 acts as
> length norm
>                      idf = (float)Math.pow(similarity.idf((int) df,
> reader.numDocs()),2) * 0.5f;
>                  }
>              }
>              return idf;
>          }
> 
>          @Override
>          public Query rewrite(final IndexReader reader, final
> MultiTermQuery query) throws IOException {
>              DisjunctionMaxQuery  bq =
> (DisjunctionMaxQuery)super.rewrite(reader, query);
> 
>              float idfBoost = getQueryBoost(reader, query);
>              Iterator<Query> iterator = bq.iterator();
>              while(iterator.hasNext())
>              {
>                  Query next = iterator.next();
>                  next.setBoost(next.getBoost() * idfBoost);
>              }
>              return bq;
>          }
> 
>      }
>      */
> }


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to