Re: Diffs for enabling query rewriting

Clemens Marschner Sun, 10 Nov 2002 15:46:23 -0800

I see what I can do.

--C.


----- Original Message -----
From: "Otis Gospodnetic" <[EMAIL PROTECTED]>
To: "Lucene Developers List" <[EMAIL PROTECTED]>
Cc: "Aaron Galea" <[EMAIL PROTECTED]>
Sent: Sunday, November 10, 2002 7:33 PM
Subject: Re: Diffs for enabling query rewriting


> Hm, developers are not responding to this 3 week old email. :(
> Clemens, could you also provide some unit tests with this?
>
> Thanks,
> Otis
>
>
> --- Clemens Marschner <[EMAIL PROTECTED]> wrote:
> > Enclosed you find the diffs I promised for enabling query rewriting.
> >
> > This also enables tools such as the HTML term highlighter
> > (http://www.iq-computing.de/lucene/highlight.jsp). There's one
> > difference to
> > the white paper there: I didn't want to make arrays public, so
> > getClauses()
> > in BooleanClause only returns an iterator. The same with getTerms()
> > in
> > PhraseQuery. I have included my version of LuceneTools.java as
> > presented on
> > the website I mentioned.
> >
> > I've also got an example for query rewriting, but since it uses an
> > external
> > library, I've left it out here.
> >
> > Regards,
> >
> > Clemens
> >
> >
> >
> >
> > --------------------------------------
> > http://www.cmarschner.net
> > > /*
> >
> > Lucene-Highlighting - Lucene utilities to highlight terms in texts
> > Copyright (C) 2001 Maik Schreiber
> >
> > This library is free software; you can redistribute it and/or modify
> > it
> > under the terms of the GNU Lesser General Public License as published
> > by
> > the Free Software Foundation; either version 2.1 of the License, or
> > (at your option) any later version.
> >
> > This library is distributed in the hope that it will be useful, but
> > WITHOUT ANY WARRANTY; without even the implied warranty of
> > MERCHANTABILITY
> > or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General
> > Public
> > License for more details.
> >
> > You should have received a copy of the GNU Lesser General Public
> > License along with this library; if not, write to the Free Software
> > Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307
> > USA
> >
> > */
> >
> > package de.iqcomputing.lucene;
> >
> > import java.io.*;
> > import java.util.*;
> > import org.apache.lucene.analysis.*;
> > import org.apache.lucene.index.*;
> > import org.apache.lucene.search.*;
> >
> >
> > /**
> >  * Contains miscellaneous utility methods for use with Lucene.
> >  *
> >  * @version $Id: LuceneTools.java,v 1.5 2001/10/16 07:25:55 mickey
> > Exp $
> >  * @author Maik Schreiber (mailto: [EMAIL PROTECTED])
> >  */
> > public final class LuceneTools
> > {
> >   /** LuceneTools must not be instantiated directly. */
> >   private LuceneTools() {}
> >
> >
> >   /**
> >    * Highlights a text in accordance to a given query.
> >    *
> >    * @param text        text to highlight terms in
> >    * @param highlighter TermHighlighter to use to highlight terms in
> > the text
> >    * @param query       Query which contains the terms to be
> > highlighted in the text
> >    * @param analyzer    Analyzer used to construct the Query
> >    *
> >    * @return highlighted text
> >    */
> >   public static final String highlightTerms(String text,
> > TermHighlighter highlighter, Query query,
> >     Analyzer analyzer) throws IOException
> >   {
> >     StringBuffer newText = new StringBuffer();
> >     TokenStream stream = null;
> >
> >     try
> >     {
> >       HashSet terms = new HashSet();
> >       org.apache.lucene.analysis.Token token;
> >       String tokenText;
> >       int startOffset;
> >       int endOffset;
> >       int lastEndOffset = 0;
> >
> >       // get terms in query
> >       getTerms(query, terms, false);
> >
> >       stream = analyzer.tokenStream(new StringReader(text));
> >       while ((token = stream.next()) != null)
> >       {
> >         startOffset = token.startOffset();
> >         endOffset = token.endOffset();
> >         tokenText = text.substring(startOffset, endOffset);
> >
> >         // append text between end of last token (or beginning of
> > text) and start of current token
> >         if (startOffset > lastEndOffset)
> >           newText.append(text.substring(lastEndOffset, startOffset));
> >
> >         // does query contain current token?
> >         if (terms.contains(token.termText()))
> >           newText.append(highlighter.highlightTerm(tokenText));
> >         else
> >           newText.append(tokenText);
> >
> >         lastEndOffset = endOffset;
> >       }
> >
> >       // append text after end of last token
> >       if (lastEndOffset < text.length())
> >         newText.append(text.substring(lastEndOffset));
> >
> >       return newText.toString();
> >     }
> >     finally
> >     {
> >       if (stream != null)
> >       {
> >         try
> >         {
> >           stream.close();
> >         }
> >         catch (Exception e) {}
> >       }
> >     }
> >   }
> >
> >   /**
> >    * Extracts all term texts of a given Query. Term texts will be
> > returned in lower-case.
> >    *
> >    * @param query      Query to extract term texts from
> >    * @param terms      HashSet where extracted term texts should be
> > put into (Elements: String)
> >    * @param prohibited <code>true</code> to extract "prohibited"
> > terms, too
> >    */
> >   public static final void getTerms(Query query, HashSet terms,
> > boolean prohibited)
> >     throws IOException
> >   {
> >     if (query instanceof BooleanQuery)
> >       getTermsFromBooleanQuery((BooleanQuery) query, terms,
> > prohibited);
> >     else if (query instanceof PhraseQuery)
> >       getTermsFromPhraseQuery((PhraseQuery) query, terms);
> >     else if (query instanceof TermQuery)
> >       getTermsFromTermQuery((TermQuery) query, terms);
> >     else if (query instanceof PrefixQuery)
> >       getTermsFromPrefixQuery((PrefixQuery) query, terms,
> > prohibited);
> >     else if (query instanceof RangeQuery)
> >       getTermsFromRangeQuery((RangeQuery) query, terms, prohibited);
> >     else if (query instanceof MultiTermQuery)
> >       getTermsFromMultiTermQuery((MultiTermQuery) query, terms,
> > prohibited);
> >   }
> >
> >   /**
> >    * Extracts all term texts of a given BooleanQuery. Term texts will
> > be returned in lower-case.
> >    *
> >    * @param query      BooleanQuery to extract term texts from
> >    * @param terms      HashSet where extracted term texts should be
> > put into (Elements: String)
> >    * @param prohibited <code>true</code> to extract "prohibited"
> > terms, too
> >    */
> >   private static final void getTermsFromBooleanQuery(BooleanQuery
> > query, HashSet terms,
> >     boolean prohibited) throws IOException
> >   {
> >     Iterator queryClauses = query.getClauses();
> >     while(queryClauses.hasNext())
> >     {
> >         BooleanClause cl = (BooleanClause)queryClauses.next();
> >       if (prohibited || cl.prohibited)
> >         getTerms(cl.query, terms, prohibited);
> >     }
> >   }
> >
> >   /**
> >    * Extracts all term texts of a given PhraseQuery. Term texts will
> > be returned in lower-case.
> >    *
> >    * @param query PhraseQuery to extract term texts from
> >    * @param terms HashSet where extracted term texts should be put
> > into (Elements: String)
> >    */
> >   private static final void getTermsFromPhraseQuery(PhraseQuery
> > query, HashSet terms)
> >   {
> >     Iterator queryTerms = query.getTerms();
> >     int i;
> >
> >     while(queryTerms.hasNext())
> >       terms.add(getTermsFromTerm((Term)queryTerms.next()));
> >   }
> >
> >   /**
> >    * Extracts all term texts of a given TermQuery. Term texts will be
> > returned in lower-case.
> >    *
> >    * @param query TermQuery to extract term texts from
> >    * @param terms HashSet where extracted term texts should be put
> > into (Elements: String)
> >    */
> >   private static final void getTermsFromTermQuery(TermQuery query,
> > HashSet terms)
> >   {
> >     terms.add(getTermsFromTerm(query.getTerm()));
> >   }
> >
> >   /**
> >    * Extracts all term texts of a given MultiTermQuery. Term texts
> > will be returned in lower-case.
> >    *
> >    * @param query      MultiTermQuery to extract term texts from
> >    * @param terms      HashSet where extracted term texts should be
> > put into (Elements: String)
> >    * @param prohibited <code>true</code> to extract "prohibited"
> > terms, too
> >    */
> >   private static final void getTermsFromMultiTermQuery(MultiTermQuery
> > query, HashSet terms,
> >     boolean prohibited) throws IOException
> >   {
> >     getTerms(query.getQuery(), terms, prohibited);
> >   }
> >
> >   /**
> >    * Extracts all term texts of a given PrefixQuery. Term texts will
> > be returned in lower-case.
> >    *
> >    * @param query      PrefixQuery to extract term texts from
> >    * @param terms      HashSet where extracted term texts should be
> > put into (Elements: String)
> >    * @param prohibited <code>true</code> to extract "prohibited"
> > terms, too
> >    */
> >   private static final void getTermsFromPrefixQuery(PrefixQuery
> > query, HashSet terms,
> >     boolean prohibited) throws IOException
> >   {
> >     getTerms(query.getQuery(), terms, prohibited);
> >   }
> >
> >   /**
> >    * Extracts all term texts of a given RangeQuery. Term texts will
> > be returned in lower-case.
> >    *
> >    * @param query      RangeQuery to extract term texts from
> >    * @param terms      HashSet where extracted term texts should be
> > put into (Elements: String)
> >    * @param prohibited <code>true</code> to extract "prohibited"
> > terms, too
> >    */
> >   private static final void getTermsFromRangeQuery(RangeQuery query,
> > HashSet terms,
> >     boolean prohibited) throws IOException
> >   {
> >     getTerms(query.getQuery(), terms, prohibited);
> >   }
> >
> >   /**
> >    * Extracts the term of a given Term. The term will be returned in
> > lower-case.
> >    *
> >    * @param term Term to extract term from
> >    *
> >    * @return the Term's term text
> >    */
> >   private static final String getTermsFromTerm(Term term)
> >   {
> >     return term.text().toLowerCase();
> >   }
> > }
> >
>
> > ATTACHMENT part 3 application/octet-stream name=BooleanClause.diff
>
>
> > ATTACHMENT part 4 application/octet-stream name=BooleanQuery.diff
>
>
> > ATTACHMENT part 5 application/octet-stream name=FuzzyQuery.diff
>
>
> > ATTACHMENT part 6 application/octet-stream
> name=PhrasePrefixQuery.diff
>
>
> > ATTACHMENT part 7 application/octet-stream name=PhraseQuery.diff
>
>
> > ATTACHMENT part 8 application/octet-stream name=PrefixQuery.diff
>
>
> > ATTACHMENT part 9 application/octet-stream name=Query.diff
>
>
> > ATTACHMENT part 10 application/octet-stream name=RangeQuery.diff
>
>
> > ATTACHMENT part 11 application/octet-stream name=TermQuery.diff
>
>
> > ATTACHMENT part 12 application/octet-stream name=WildcardQuery.diff
>
>
> > ATTACHMENT part 13 application/octet-stream name=Term.diff
> > --
> > To unsubscribe, e-mail:
> > <mailto:lucene-dev-unsubscribe@;jakarta.apache.org>
> > For additional commands, e-mail:
> <mailto:lucene-dev-help@;jakarta.apache.org>
>
>
> __________________________________________________
> Do you Yahoo!?
> U2 on LAUNCH - Exclusive greatest hits videos
> http://launch.yahoo.com/u2
>
> --
> To unsubscribe, e-mail:
<mailto:lucene-dev-unsubscribe@;jakarta.apache.org>
> For additional commands, e-mail:
<mailto:lucene-dev-help@;jakarta.apache.org>
>


--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@;jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@;jakarta.apache.org>

Re: Diffs for enabling query rewriting

Reply via email to