Re: Uable to extends TopTermsRewrite in Lucene 4.1

Paul Taylor Thu, 04 Apr 2013 03:47:25 -0700

On 04/04/2013 10:59, Paul Taylor wrote:

On 27/02/2013 10:28, Uwe Schindler wrote:
Hi Paul,
QueryParser and MTQ's rewrite method have nothing to do with eachother. The rewrite method is (explained as simple as possible) aclass that is responsible to "rewrite" a MultiTermQuery to anotherquery type (generally a query that allows to add "Term" instances,e.g. BooleanQuery of TermQuery or DisjunctionMaxQuery of Terms). Therewrite method takes the "filtered" terms enum provided by the queryand creates a combined query out of it. Lucene ships with somealready implemented rewrite methods based on abstract classes thathandle the most common cases:
- ScoringRewrite handles the case where you want to collect the termsfrom the termsenum and place them as "clauses" in a top level query(e.g. a scoring BooleanQuery). You have to implement 2 abstractmethods that produce the top-level query and create the clauses, thatcan be added to the top-level query. This class is generic to thetop-level query, as the clauses can only be added to the correcttop-level query. To make this work without casting, all methods areredefined to take the generics classes. So addClause() takes thegeneric top level query and a term. The rewrite method by itselfreturns the top level query- TopTermsRewrite is similar, but has a major difference: It hasalmost same API, but the internal implementation of this class isdifferent: It never hits the Boolean Max Clause Count, because thecollected terms are ordered in a priority queue and only thetop-ranking terms are added to the resulting top-level query. Thisclass is also generified against the top-level query. Rewrite returnsan instance of the top-level query.- The very base class MultiTermQuery.RewriteMethod is most flexiblebut has no concrete implementation. It is used to rewrite a MTQ to aquery that is not a composite top-level one with a number of terms,e.g. a filter that’s handled in a totally different stage of rewriting.
You can use the same MTQ rewrite for different MTQ types, e.g. youcan rewrite a FuzzyQuery to a simple ConstantScore Query or aDisjunctionMaxQuery - but only the second one makes sense. On theother hand it makes no sense to rewrite Prefix and Wildcard usingTopTermsRewrite, as those queries have terms enums withouth termboosts (only Fuzzy assigns a boost to every term depending onlevensthein distance).
Things to note:
A rewrite method in MTQ would never rewrite to another MTQ likePrefixQuery - it could do this, but only in the lowest base class(see above)! -> If you rely on that, your code has a major problem.In that case the correct behavior would be to create a completely"own"oal.search.Query (that not extends MTQ) and implement a standardrewrite logic. This query could of course rewrite to MTQ's like Fuzzyor Prefix. IndexSearcher rewrites the query until it is completelyrewritten, so your custom query would create a PrefixQuery whichitself rewrites to something else.
QueryParser is just a factory for queries, its not related to MTQ. Itonly has an option to set a "default" method for common queries. Butas you have a custom QueryParser, you can return the queries,configured like you want, to the caller.
Uwe
Hi Uwe
Okay, think I have it now. Now have a working rewrite method for FuzzyQueries
public static class FuzzyTermRewrite<Q extendsDisjunctionMaxQuery> extends TopTermsRewrite<Query> {
        public FuzzyTermRewrite(int size) {
            super(size);
        }

        @Override
        protected int getMaxSize() {
            return BooleanQuery.getMaxClauseCount();
        }

        @Override
        protected DisjunctionMaxQuery getTopLevelQuery() {
            return new DisjunctionMaxQuery(0.1f);
        }

        @Override
protected void addClause(Query topLevel, Term term, intdocCount, float boost, TermContext states) {final Query tq = new ConstantScoreQuery(newTermQuery(term, states));
            tq.setBoost(boost);
            ((DisjunctionMaxQuery)topLevel).add(tq);
        }
    }
and now writing a separate class for Prefix Queries so it doesactually modify the idf
Paul


and this is my prefix rewrite method:

/**
     *

* Prefix matches are rewritten to a DisjunctionMaxQuery instead ofthe more usual BooleanQuery so that* if search term matches multiple fields we just take the bestfield rather summing all matches like a boolean* query. The 0.1 for tiebreaker is to favour documents thatcontain all words rather than the same word in multiple

     * fields.
     *

* We set the idf the same as an exact match so that a wildcardmatch to a term which happens to be rarer than* the exact term we were searching for does not get an unfairlyhigh idf.

     *
     */

public static class PrefixTermRewrite extendsMultiTermQuery.RewriteMethod {


        private TFIDFSimilarity     similarity;
        private FuzzyTermRewrite    rewrite;

        public PrefixTermRewrite(int size) {
            this.rewrite    = new FuzzyTermRewrite(size);
            this.similarity = new DefaultSimilarity();
        }

protected float getQueryBoost(final IndexReader reader, finalMultiTermQuery query)

                throws IOException {
            float idf = 1f;
            float df;
            PrefixQuery fq = (PrefixQuery) query;
            df = reader.docFreq(fq.getPrefix());
            if(df>=1)
            {

//Same as idf value for search term, 0.5 acts as lengthnormidf = (float)Math.pow(similarity.idf((int) df,reader.numDocs()),2) * 0.5f;

            }
            return idf;
        }


        @Override

public Query rewrite(final IndexReader reader, finalMultiTermQuery query) throws IOException {DisjunctionMaxQuery dmq =(DisjunctionMaxQuery)rewrite.rewrite(reader, query);

            float idfBoost = getQueryBoost(reader, query);
            Iterator<Query> iterator = dmq.iterator();
            while(iterator.hasNext())
            {
                Query next = iterator.next();
                next.setBoost(next.getBoost() * idfBoost);
            }
            return dmq;
        }
    }


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Uable to extends TopTermsRewrite in Lucene 4.1

Reply via email to