On 04/04/2013 10:59, Paul Taylor wrote:
On 27/02/2013 10:28, Uwe Schindler wrote:
Hi Paul,

QueryParser and MTQ's rewrite method have nothing to do with each other. The rewrite method is (explained as simple as possible) a class that is responsible to "rewrite" a MultiTermQuery to another query type (generally a query that allows to add "Term" instances, e.g. BooleanQuery of TermQuery or DisjunctionMaxQuery of Terms). The rewrite method takes the "filtered" terms enum provided by the query and creates a combined query out of it. Lucene ships with some already implemented rewrite methods based on abstract classes that handle the most common cases:

- ScoringRewrite handles the case where you want to collect the terms from the termsenum and place them as "clauses" in a top level query (e.g. a scoring BooleanQuery). You have to implement 2 abstract methods that produce the top-level query and create the clauses, that can be added to the top-level query. This class is generic to the top-level query, as the clauses can only be added to the correct top-level query. To make this work without casting, all methods are redefined to take the generics classes. So addClause() takes the generic top level query and a term. The rewrite method by itself returns the top level query - TopTermsRewrite is similar, but has a major difference: It has almost same API, but the internal implementation of this class is different: It never hits the Boolean Max Clause Count, because the collected terms are ordered in a priority queue and only the top-ranking terms are added to the resulting top-level query. This class is also generified against the top-level query. Rewrite returns an instance of the top-level query. - The very base class MultiTermQuery.RewriteMethod is most flexible but has no concrete implementation. It is used to rewrite a MTQ to a query that is not a composite top-level one with a number of terms, e.g. a filter that’s handled in a totally different stage of rewriting.

You can use the same MTQ rewrite for different MTQ types, e.g. you can rewrite a FuzzyQuery to a simple ConstantScore Query or a DisjunctionMaxQuery - but only the second one makes sense. On the other hand it makes no sense to rewrite Prefix and Wildcard using TopTermsRewrite, as those queries have terms enums withouth term boosts (only Fuzzy assigns a boost to every term depending on levensthein distance).

Things to note:
A rewrite method in MTQ would never rewrite to another MTQ like PrefixQuery - it could do this, but only in the lowest base class (see above)! -> If you rely on that, your code has a major problem. In that case the correct behavior would be to create a completely "own"oal.search.Query (that not extends MTQ) and implement a standard rewrite logic. This query could of course rewrite to MTQ's like Fuzzy or Prefix. IndexSearcher rewrites the query until it is completely rewritten, so your custom query would create a PrefixQuery which itself rewrites to something else.

QueryParser is just a factory for queries, its not related to MTQ. It only has an option to set a "default" method for common queries. But as you have a custom QueryParser, you can return the queries, configured like you want, to the caller.

Uwe

Hi Uwe

Okay, think I have it now. Now have a working rewrite method for Fuzzy Queries

public static class FuzzyTermRewrite<Q extends DisjunctionMaxQuery> extends TopTermsRewrite<Query> {

        public FuzzyTermRewrite(int size) {
            super(size);
        }

        @Override
        protected int getMaxSize() {
            return BooleanQuery.getMaxClauseCount();
        }

        @Override
        protected DisjunctionMaxQuery getTopLevelQuery() {
            return new DisjunctionMaxQuery(0.1f);
        }

        @Override
protected void addClause(Query topLevel, Term term, int docCount, float boost, TermContext states) { final Query tq = new ConstantScoreQuery(new TermQuery(term, states));
            tq.setBoost(boost);
            ((DisjunctionMaxQuery)topLevel).add(tq);
        }
    }

and now writing a separate class for Prefix Queries so it does actually modify the idf

Paul


and this is my prefix rewrite method:

/**
     *
* Prefix matches are rewritten to a DisjunctionMaxQuery instead of the more usual BooleanQuery so that * if search term matches multiple fields we just take the best field rather summing all matches like a boolean * query. The 0.1 for tiebreaker is to favour documents that contain all words rather than the same word in multiple
     * fields.
     *
* We set the idf the same as an exact match so that a wildcard match to a term which happens to be rarer than * the exact term we were searching for does not get an unfairly high idf.
     *
     */
public static class PrefixTermRewrite extends MultiTermQuery.RewriteMethod {

        private TFIDFSimilarity     similarity;
        private FuzzyTermRewrite    rewrite;

        public PrefixTermRewrite(int size) {
            this.rewrite    = new FuzzyTermRewrite(size);
            this.similarity = new DefaultSimilarity();
        }

protected float getQueryBoost(final IndexReader reader, final MultiTermQuery query)
                throws IOException {
            float idf = 1f;
            float df;
            PrefixQuery fq = (PrefixQuery) query;
            df = reader.docFreq(fq.getPrefix());
            if(df>=1)
            {
//Same as idf value for search term, 0.5 acts as length norm idf = (float)Math.pow(similarity.idf((int) df, reader.numDocs()),2) * 0.5f;
            }
            return idf;
        }


        @Override
public Query rewrite(final IndexReader reader, final MultiTermQuery query) throws IOException { DisjunctionMaxQuery dmq = (DisjunctionMaxQuery)rewrite.rewrite(reader, query);
            float idfBoost = getQueryBoost(reader, query);
            Iterator<Query> iterator = dmq.iterator();
            while(iterator.hasNext())
            {
                Query next = iterator.next();
                next.setBoost(next.getBoost() * idfBoost);
            }
            return dmq;
        }
    }


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to