Bill, the simple answer to your original question is that in general you
should apply the same or similar analysis for your query terms as you do
with your indexed data. In your specific case the Query.toString is
generating your unanalyzed terms and then the query parser is performing the
needed analysis. The real point is that you should be doing the tem analysis
before invoking "new Term". Alas, term analysis has changed dramatically
over the past couple of years, so the solution to doing analysis before
generating a Term/TermQuery will vary from Lucene release to release.
We really do need a wiki page for Lucene term analysis.
-- Jack Krupansky
-----Original Message-----
From: Bill Chesky
Sent: Friday, August 03, 2012 9:19 AM
To: simon.willna...@gmail.com ; java-user@lucene.apache.org
Subject: RE: Analyzer on query question
Thanks Simon,
Unfortunately, I'm using Lucene 3.0.1 and CharTermAttribute doesn't seem to
have been introduced until 3.1.0. Similarly my version of Lucene does not
have a BooleanQuery.addClause(BooleanClause) method. Maybe you meant
BooleanQuery.add(BooleanClause).
In any case, most of what you're doing there, I'm just not familiar with.
Seems very low level. I've never had to use TokenStreams to build a query
before and I'm not really sure what is going on there. Also, I don't know
what PositionIncrementAttribute is or how it would be used to create a
PhraseQuery. The way I'm currently creating PhraseQuerys is very
straightforward and intuitive. E.g. to search for the term "foo bar" I'd
build the query like this:
PhraseQuery phraseQuery = new PhraseQuery();
phraseQuery.add(new Term("title", "foo"));
phraseQuery.add(new Term("title", "bar"));
Is there really no easier way to associate the correct analyzer with these
types of queries?
Bill
-----Original Message-----
From: Simon Willnauer [mailto:simon.willna...@gmail.com]
Sent: Friday, August 03, 2012 3:43 AM
To: java-user@lucene.apache.org; Bill Chesky
Subject: Re: Analyzer on query question
On Thu, Aug 2, 2012 at 11:09 PM, Bill Chesky
<bill.che...@learninga-z.com> wrote:
Hi,
I understand that generally speaking you should use the same analyzer on
querying as was used on indexing. In my code I am using the
SnowballAnalyzer on index creation. However, on the query side I am
building up a complex BooleanQuery from other BooleanQuerys and/or
PhraseQuerys on several fields. None of these require specifying an
analyzer anywhere. This is causing some odd results, I think, because a
different analyzer (or no analyzer?) is being used for the query.
Question: how do I build my boolean and phrase queries using the
SnowballAnalyzer?
One thing I did that seemed to kind of work was to build my complex query
normally then build a snowball-analyzed query using a QueryParser
instantiated with a SnowballAnalyzer. To do this, I simply pass the
string value of the complex query to the QueryParser.parse() method to get
the new query. Something like this:
// build a complex query from other BooleanQuerys and PhraseQuerys
BooleanQuery fullQuery = buildComplexQuery();
QueryParser parser = new QueryParser(Version.LUCENE_30, "title", new
SnowballAnalyzer(Version.LUCENE_30, "English"));
Query snowballAnalyzedQuery = parser.parse(fullQuery.toString());
TopScoreDocCollector collector = TopScoreDocCollector.create(10000,
true);
indexSearcher.search(snowballAnalyzedQuery, collector);
you can just use the analyzer directly like this:
Analyzer analyzer = new SnowballAnalyzer(Version.LUCENE_30, "English");
TokenStream stream = analyzer.tokenStream("title", new
StringReader(fullQuery.toString()):
CharTermAttribute termAttr = stream.addAttribute(CharTermAttribute.class);
stream.reset();
BooleanQuery q = new BooleanQuery();
while(stream.incrementToken()) {
q.addClause(new BooleanClause(Occur.MUST, new Term("title",
termAttr.toString())));
}
you also have access to the token positions if you want to create
phrase queries etc. just add a PositionIncrementAttribute like this:
PositionIncrementAttribute posAttr =
stream.addAttribute(PositionsIncrementAttribute.class);
pls. doublecheck the code it's straight from the top of my head.
simon
Like I said, this seems to kind of work but it doesn't feel right. Does
this make sense? Is there a better way?
thanks in advance,
Bill
----------------------------------------------
T ususcib, -mil jvausr-nsbs...@ucneapch.ogfo adiioalcomads
emal:jaa-se-hlpluen.aace.rg
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org