Any Lucene developers looked at this yet? +1s anyone? -1s anyone? Clemens - I may have to ask you about a new set of diffs, sorry, I just touched a bunch of classes, before I realized that this email contains so many diffs.
Otis --- Clemens Marschner <[EMAIL PROTECTED]> wrote: > Enclosed you find the diffs I promised for enabling query rewriting. > > This also enables tools such as the HTML term highlighter > (http://www.iq-computing.de/lucene/highlight.jsp). There's one > difference to > the white paper there: I didn't want to make arrays public, so > getClauses() > in BooleanClause only returns an iterator. The same with getTerms() > in > PhraseQuery. I have included my version of LuceneTools.java as > presented on > the website I mentioned. > > I've also got an example for query rewriting, but since it uses an > external > library, I've left it out here. > > Regards, > > Clemens > > > > > -------------------------------------- > http://www.cmarschner.net > > /* > > Lucene-Highlighting � Lucene utilities to highlight terms in texts > Copyright (C) 2001 Maik Schreiber > > This library is free software; you can redistribute it and/or modify > it > under the terms of the GNU Lesser General Public License as published > by > the Free Software Foundation; either version 2.1 of the License, or > (at your option) any later version. > > This library is distributed in the hope that it will be useful, but > WITHOUT ANY WARRANTY; without even the implied warranty of > MERCHANTABILITY > or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General > Public > License for more details. > > You should have received a copy of the GNU Lesser General Public > License along with this library; if not, write to the Free Software > Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 > USA > > */ > > package de.iqcomputing.lucene; > > import java.io.*; > import java.util.*; > import org.apache.lucene.analysis.*; > import org.apache.lucene.index.*; > import org.apache.lucene.search.*; > > > /** > * Contains miscellaneous utility methods for use with Lucene. > * > * @version $Id: LuceneTools.java,v 1.5 2001/10/16 07:25:55 mickey > Exp $ > * @author Maik Schreiber (mailto: [EMAIL PROTECTED]) > */ > public final class LuceneTools > { > /** LuceneTools must not be instantiated directly. */ > private LuceneTools() {} > > > /** > * Highlights a text in accordance to a given query. > * > * @param text text to highlight terms in > * @param highlighter TermHighlighter to use to highlight terms in > the text > * @param query Query which contains the terms to be > highlighted in the text > * @param analyzer Analyzer used to construct the Query > * > * @return highlighted text > */ > public static final String highlightTerms(String text, > TermHighlighter highlighter, Query query, > Analyzer analyzer) throws IOException > { > StringBuffer newText = new StringBuffer(); > TokenStream stream = null; > > try > { > HashSet terms = new HashSet(); > org.apache.lucene.analysis.Token token; > String tokenText; > int startOffset; > int endOffset; > int lastEndOffset = 0; > > // get terms in query > getTerms(query, terms, false); > > stream = analyzer.tokenStream(new StringReader(text)); > while ((token = stream.next()) != null) > { > startOffset = token.startOffset(); > endOffset = token.endOffset(); > tokenText = text.substring(startOffset, endOffset); > > // append text between end of last token (or beginning of > text) and start of current token > if (startOffset > lastEndOffset) > newText.append(text.substring(lastEndOffset, startOffset)); > > // does query contain current token? > if (terms.contains(token.termText())) > newText.append(highlighter.highlightTerm(tokenText)); > else > newText.append(tokenText); > > lastEndOffset = endOffset; > } > > // append text after end of last token > if (lastEndOffset < text.length()) > newText.append(text.substring(lastEndOffset)); > > return newText.toString(); > } > finally > { > if (stream != null) > { > try > { > stream.close(); > } > catch (Exception e) {} > } > } > } > > /** > * Extracts all term texts of a given Query. Term texts will be > returned in lower-case. > * > * @param query Query to extract term texts from > * @param terms HashSet where extracted term texts should be > put into (Elements: String) > * @param prohibited <code>true</code> to extract "prohibited" > terms, too > */ > public static final void getTerms(Query query, HashSet terms, > boolean prohibited) > throws IOException > { > if (query instanceof BooleanQuery) > getTermsFromBooleanQuery((BooleanQuery) query, terms, > prohibited); > else if (query instanceof PhraseQuery) > getTermsFromPhraseQuery((PhraseQuery) query, terms); > else if (query instanceof TermQuery) > getTermsFromTermQuery((TermQuery) query, terms); > else if (query instanceof PrefixQuery) > getTermsFromPrefixQuery((PrefixQuery) query, terms, > prohibited); > else if (query instanceof RangeQuery) > getTermsFromRangeQuery((RangeQuery) query, terms, prohibited); > else if (query instanceof MultiTermQuery) > getTermsFromMultiTermQuery((MultiTermQuery) query, terms, > prohibited); > } > > /** > * Extracts all term texts of a given BooleanQuery. Term texts will > be returned in lower-case. > * > * @param query BooleanQuery to extract term texts from > * @param terms HashSet where extracted term texts should be > put into (Elements: String) > * @param prohibited <code>true</code> to extract "prohibited" > terms, too > */ > private static final void getTermsFromBooleanQuery(BooleanQuery > query, HashSet terms, > boolean prohibited) throws IOException > { > Iterator queryClauses = query.getClauses(); > while(queryClauses.hasNext()) > { > BooleanClause cl = (BooleanClause)queryClauses.next(); > if (prohibited || cl.prohibited) > getTerms(cl.query, terms, prohibited); > } > } > > /** > * Extracts all term texts of a given PhraseQuery. Term texts will > be returned in lower-case. > * > * @param query PhraseQuery to extract term texts from > * @param terms HashSet where extracted term texts should be put > into (Elements: String) > */ > private static final void getTermsFromPhraseQuery(PhraseQuery > query, HashSet terms) > { > Iterator queryTerms = query.getTerms(); > int i; > > while(queryTerms.hasNext()) > terms.add(getTermsFromTerm((Term)queryTerms.next())); > } > > /** > * Extracts all term texts of a given TermQuery. Term texts will be > returned in lower-case. > * > * @param query TermQuery to extract term texts from > * @param terms HashSet where extracted term texts should be put > into (Elements: String) > */ > private static final void getTermsFromTermQuery(TermQuery query, > HashSet terms) > { > terms.add(getTermsFromTerm(query.getTerm())); > } > > /** > * Extracts all term texts of a given MultiTermQuery. Term texts > will be returned in lower-case. > * > * @param query MultiTermQuery to extract term texts from > * @param terms HashSet where extracted term texts should be > put into (Elements: String) > * @param prohibited <code>true</code> to extract "prohibited" > terms, too > */ > private static final void getTermsFromMultiTermQuery(MultiTermQuery > query, HashSet terms, > boolean prohibited) throws IOException > { > getTerms(query.getQuery(), terms, prohibited); > } > > /** > * Extracts all term texts of a given PrefixQuery. Term texts will > be returned in lower-case. > * > * @param query PrefixQuery to extract term texts from > * @param terms HashSet where extracted term texts should be > put into (Elements: String) > * @param prohibited <code>true</code> to extract "prohibited" > terms, too > */ > private static final void getTermsFromPrefixQuery(PrefixQuery > query, HashSet terms, > boolean prohibited) throws IOException > { > getTerms(query.getQuery(), terms, prohibited); > } > > /** > * Extracts all term texts of a given RangeQuery. Term texts will > be returned in lower-case. > * > * @param query RangeQuery to extract term texts from > * @param terms HashSet where extracted term texts should be > put into (Elements: String) > * @param prohibited <code>true</code> to extract "prohibited" > terms, too > */ > private static final void getTermsFromRangeQuery(RangeQuery query, > HashSet terms, > boolean prohibited) throws IOException > { > getTerms(query.getQuery(), terms, prohibited); > } > > /** > * Extracts the term of a given Term. The term will be returned in > lower-case. > * > * @param term Term to extract term from > * > * @return the Term's term text > */ > private static final String getTermsFromTerm(Term term) > { > return term.text().toLowerCase(); > } > } > > ATTACHMENT part 3 application/octet-stream name=BooleanClause.diff > ATTACHMENT part 4 application/octet-stream name=BooleanQuery.diff > ATTACHMENT part 5 application/octet-stream name=FuzzyQuery.diff > ATTACHMENT part 6 application/octet-stream name=PhrasePrefixQuery.diff > ATTACHMENT part 7 application/octet-stream name=PhraseQuery.diff > ATTACHMENT part 8 application/octet-stream name=PrefixQuery.diff > ATTACHMENT part 9 application/octet-stream name=Query.diff > ATTACHMENT part 10 application/octet-stream name=RangeQuery.diff > ATTACHMENT part 11 application/octet-stream name=TermQuery.diff > ATTACHMENT part 12 application/octet-stream name=WildcardQuery.diff > ATTACHMENT part 13 application/octet-stream name=Term.diff > -- > To unsubscribe, e-mail: > <mailto:lucene-dev-unsubscribe@;jakarta.apache.org> > For additional commands, e-mail: <mailto:lucene-dev-help@;jakarta.apache.org> __________________________________________________ Do you Yahoo!? U2 on LAUNCH - Exclusive greatest hits videos http://launch.yahoo.com/u2 -- To unsubscribe, e-mail: <mailto:lucene-dev-unsubscribe@;jakarta.apache.org> For additional commands, e-mail: <mailto:lucene-dev-help@;jakarta.apache.org>
