RE: Analyzer on query question

Bill Chesky Fri, 03 Aug 2012 09:54:00 -0700

Jack,

Thanks.  Yeah, I don't know what you mean be term analysis.  I googled it but 
didn't come up with much.  So if that is the preferred way of doing this, a 
wiki document would be greatly appreciated.

I notice you did say I should be doing the term analysis first.  But is it 
wrong to do it the way I described in my original email?  Will it give me 
incorrect results?

Bill

-----Original Message-----
From: Jack Krupansky [mailto:[email protected]] 
Sent: Friday, August 03, 2012 9:33 AM
To: [email protected]
Subject: Re: Analyzer on query question

Bill, the simple answer to your original question is that in general you 
should apply the same or similar analysis for your query terms as you do 
with your indexed data. In your specific case the Query.toString is 
generating your unanalyzed terms and then the query parser is performing the 
needed analysis. The real point is that you should be doing the tem analysis 
before invoking "new Term". Alas, term analysis has changed dramatically 
over the past couple of years, so the solution to doing analysis before 
generating a Term/TermQuery will vary from Lucene release to release.

We really do need a wiki page for Lucene term analysis.

-- Jack Krupansky

-----Original Message----- 
From: Bill Chesky
Sent: Friday, August 03, 2012 9:19 AM
To: [email protected] ; [email protected]
Subject: RE: Analyzer on query question

Thanks Simon,

Unfortunately, I'm using Lucene 3.0.1 and CharTermAttribute doesn't seem to 
have been introduced until 3.1.0.  Similarly my version of Lucene does not 
have a BooleanQuery.addClause(BooleanClause) method.  Maybe you meant 
BooleanQuery.add(BooleanClause).

In any case, most of what you're doing there, I'm just not familiar with. 
Seems very low level.  I've never had to use TokenStreams to build a query 
before and I'm not really sure what is going on there.  Also, I don't know 
what PositionIncrementAttribute is or how it would be used to create a 
PhraseQuery.   The way I'm currently creating PhraseQuerys is very 
straightforward and intuitive.  E.g. to search for the term "foo bar" I'd 
build the query like this:

PhraseQuery phraseQuery = new PhraseQuery();
phraseQuery.add(new Term("title", "foo"));
phraseQuery.add(new Term("title", "bar"));

Is there really no easier way to associate the correct analyzer with these 
types of queries?

Bill

-----Original Message-----
From: Simon Willnauer [mailto:[email protected]]
Sent: Friday, August 03, 2012 3:43 AM
To: [email protected]; Bill Chesky
Subject: Re: Analyzer on query question

On Thu, Aug 2, 2012 at 11:09 PM, Bill Chesky
<[email protected]> wrote:
> Hi,
>
> I understand that generally speaking you should use the same analyzer on 
> querying as was used on indexing.  In my code I am using the 
> SnowballAnalyzer on index creation.  However, on the query side I am 
> building up a complex BooleanQuery from other BooleanQuerys and/or 
> PhraseQuerys on several fields.  None of these require specifying an 
> analyzer anywhere.  This is causing some odd results, I think, because a 
> different analyzer (or no analyzer?) is being used for the query.
>
> Question: how do I build my boolean and phrase queries using the 
> SnowballAnalyzer?
>
> One thing I did that seemed to kind of work was to build my complex query 
> normally then build a snowball-analyzed query using a QueryParser 
> instantiated with a SnowballAnalyzer.  To do this, I simply pass the 
> string value of the complex query to the QueryParser.parse() method to get 
> the new query.  Something like this:
>
>     // build a complex query from other BooleanQuerys and PhraseQuerys
>     BooleanQuery fullQuery = buildComplexQuery();
>     QueryParser parser = new QueryParser(Version.LUCENE_30, "title", new 
> SnowballAnalyzer(Version.LUCENE_30, "English"));
>     Query snowballAnalyzedQuery = parser.parse(fullQuery.toString());
>
>     TopScoreDocCollector collector = TopScoreDocCollector.create(10000, 
> true);
>     indexSearcher.search(snowballAnalyzedQuery, collector);

you can just use the analyzer directly like this:
Analyzer analyzer = new SnowballAnalyzer(Version.LUCENE_30, "English");

TokenStream stream = analyzer.tokenStream("title", new
StringReader(fullQuery.toString()):
CharTermAttribute termAttr = stream.addAttribute(CharTermAttribute.class);
stream.reset();
BooleanQuery q = new BooleanQuery();
while(stream.incrementToken()) {
  q.addClause(new BooleanClause(Occur.MUST, new Term("title",
termAttr.toString())));
}

you also have access to the token positions if you want to create
phrase queries etc. just add a PositionIncrementAttribute like this:
PositionIncrementAttribute posAttr =
stream.addAttribute(PositionsIncrementAttribute.class);

pls. doublecheck the code it's straight from the top of my head.

simon

>
> Like I said, this seems to kind of work but it doesn't feel right.  Does 
> this make sense?  Is there a better way?
>
> thanks in advance,
>
> Bill

----------------------------------------------
T ususcib, -mil [email protected] adiioalcomads 
emal:jaa-se-hlpluen.aace.rg 

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

RE: Analyzer on query question

Reply via email to