you must call reset() before consuming any tokenstream. On Fri, Aug 3, 2012 at 4:03 PM, Jack Krupansky <j...@basetechnology.com> wrote: > Simon gave sample code for analyzing a multi-term string. > > Here's some pseudo-code (hasn't been compiled to check it) to analyze a > single term with Lucene 3.6: > > public Term analyzeTerm(Analyzer analyzer, String termString){ > TokenStream stream = analyzer.tokenStream(field, new > StringReader(termString)); > if (stream.incrementToken()) > return new > Term(stream.getAttribute(CharacterTermAttribute.class).toString()); > else > return null; > // TODO: Close the StringReader > // TODO: Handle terms that analyze into multiple terms (e.g., embedded > punctuation) > } > > And here's the corresponding code for Lucene 4.0: > > public Term analyzeTerm(Analyzer analyzer, String termString){ > TokenStream stream = analyzer.tokenStream(field, new > StringReader(termString)); > if (stream.incrementToken()){ > TermToBytesRefAttribute termAtt = > stream.getAttribute(TermToBytesRefAttribute.class); > BytesRef bytes = termAtt.getBytesRef(); > return new Term(BytesRef.deepCopyOf(bytes)); > } else > return null; > // TODO: Close the StringReader > // TODO: Handle terms that analyze into multiple terms (e.g., embedded > punctuation) > } > > -- Jack Krupansky > > -----Original Message----- From: Bill Chesky > Sent: Friday, August 03, 2012 2:55 PM > To: java-user@lucene.apache.org > > Subject: RE: Analyzer on query question > > Ian/Jack, > > Ok, thanks for the help. I certainly don't want to take a cheap way out, > hence my original question about whether this is the right way to do this. > Jack, you say the right way is to do Term analysis before creating the Term. > If anybody has any information on how to accomplish this I'd greatly > appreciate it. > > regards, > > Bill > > -----Original Message----- > From: Jack Krupansky [mailto:j...@basetechnology.com] > Sent: Friday, August 03, 2012 1:22 PM > To: java-user@lucene.apache.org > Subject: Re: Analyzer on query question > > Bill, the re-parse of Query.toString will work provided that your query > terms are either un-analyzed or their analyzer is "idempotent" (can be > applied repeatedly without changing the output terms.) In your case, you are > doing the former. > > The bottom line: 1) if it works for you, great, 2) for other readers, please > do not depend on this approach if your input data is filtered in any way - > if your index analyzer "filters" terms (e.g, stemming, case changes, > term-splitting), your Term/TermQuery should be analyzed/filtered comparably, > in which case the extra parse (to cause term analysis such as stemming) > becomes unnecessary and risky if you are not very careful or very lucky. > > -- Jack Krupansky > > -----Original Message----- From: Ian Lea > Sent: Friday, August 03, 2012 1:12 PM > To: java-user@lucene.apache.org > Subject: Re: Analyzer on query question > > Bill > > > You're getting the snowball stemming either way which I guess is good, > and if you get same results either way maybe it doesn't matter which > technique you use. I'd be a bit worried about parsing the result of > query.toString() because you aren't guaranteed to get back, in text, > what you put in. > > My way seems better to me, but then it would. If you prefer your way > I won't argue with you. > > > -- > Ian. > > > On Fri, Aug 3, 2012 at 5:57 PM, Bill Chesky <bill.che...@learninga-z.com> > wrote: >> >> Ian, >> >> I gave this method a try, at least the way I understood your suggestion. >> E.g. to search for the phrase "cells combine" I built up a string like: >> >> title:"cells combine" description:"cells combine" text:"cells combine" >> >> then I passed that to the queryParser.parse() method (where queryParser is >> an instance of QueryParser constructed using SnowballAnalyzer) and added >> the result as a MUST clause in my final BooleanQuery. >> >> When I print the resulting query out as a string I get: >> >> +(title:"cell combin" description:"cell combin" keywords:"cell combin") >> >> So it looks like the SnowballAnalyzer is doing some stemming for me. But >> this is the exact same result I'd get doing it the way I described in my >> original email. I just built the unanalyzed string on my own rather than >> using the various query classes like PhraseQuery, etc. >> >> So I don't see the advantage to doing it this way over the original >> method. I just don't know if the original way I described is wrong or >> will give me bad results. >> >> thanks for the help, >> >> Bill >> >> -----Original Message----- >> From: Ian Lea [mailto:ian....@gmail.com] >> Sent: Friday, August 03, 2012 9:32 AM >> To: java-user@lucene.apache.org >> Subject: Re: Analyzer on query question >> >> You can add parsed queries to a BooleanQuery. Would that help in this >> case? >> >> SnowballAnalyzer sba = whatever(); >> QueryParser qp = new QueryParser(..., sba); >> Query q1 = qp.parse("some snowball string"); >> Query q2 = qp.parse("some other snowball string"); >> >> BooleanQuery bq = new BooleanQuery(); >> bq.add(q1, ...); >> bq.add(q2, ...); >> bq.add(loads of other stuff); >> >> >> -- >> ian. >> >> >> On Fri, Aug 3, 2012 at 2:19 PM, Bill Chesky <bill.che...@learninga-z.com> >> wrote: >>> >>> Thanks Simon, >>> >>> Unfortunately, I'm using Lucene 3.0.1 and CharTermAttribute doesn't seem >>> to have been introduced until 3.1.0. Similarly my version of Lucene does >>> not have a BooleanQuery.addClause(BooleanClause) method. Maybe you meant >>> BooleanQuery.add(BooleanClause). >> >> >>> >>> In any case, most of what you're doing there, I'm just not familiar with. >>> Seems very low level. I've never had to use TokenStreams to build a >>> query before and I'm not really sure what is going on there. Also, I >>> don't know what PositionIncrementAttribute is or how it would be used to >>> create a PhraseQuery. The way I'm currently creating PhraseQuerys is >>> very straightforward and intuitive. E.g. to search for the term "foo >>> bar" I'd build the query like this: >>> >>> PhraseQuery phraseQuery = >>> new PhraseQuery(); >>> phraseQuery.add(new >>> Term("title", "foo")); >>> phraseQuery.add(new >>> Term("title", "bar")); >>> >>> Is there really no easier way to associate the correct analyzer with >>> these types of queries? >>> >>> Bill >>> >>> -----Original Message----- >>> From: Simon Willnauer [mailto:simon.willna...@gmail.com] >>> Sent: Friday, August 03, 2012 3:43 AM >>> To: java-user@lucene.apache.org; Bill Chesky >>> Subject: Re: Analyzer on query question >>> >>> On Thu, Aug 2, 2012 at 11:09 PM, Bill Chesky >>> <bill.che...@learninga-z.com> wrote: >>>> >>>> Hi, >>>> >>>> I understand that generally speaking you should use the same analyzer on >>>> querying as was used on indexing. In my code I am using the >>>> SnowballAnalyzer on index creation. However, on the query side I am >>>> building up a complex BooleanQuery from other BooleanQuerys and/or >>>> PhraseQuerys on several fields. None of these require specifying an >>>> analyzer anywhere. This is causing some odd results, I think, because a >>>> different analyzer (or no analyzer?) is being used for the query. >>>> >>>> Question: how do I build my boolean and phrase queries using the >>>> SnowballAnalyzer? >>>> >>>> One thing I did that seemed to kind of work was to build my complex >>>> query normally then build a snowball-analyzed query using a QueryParser >>>> instantiated with a SnowballAnalyzer. To do this, I simply pass the >>>> string value of the complex query to the QueryParser.parse() method to >>>> get the new query. Something like this: >>>> >>>> // build a complex query from other BooleanQuerys and PhraseQuerys >>>> BooleanQuery fullQuery = buildComplexQuery(); >>>> QueryParser parser = new QueryParser(Version.LUCENE_30, "title", new >>>> SnowballAnalyzer(Version.LUCENE_30, "English")); >>>> Query snowballAnalyzedQuery = parser.parse(fullQuery.toString()); >>>> >>>> TopScoreDocCollector collector = TopScoreDocCollector.create(10000, >>>> true); >>>> indexSearcher.search(snowballAnalyzedQuery, collector); >>> >>> >>> you can just use the analyzer directly like this: >>> Analyzer analyzer = new SnowballAnalyzer(Version.LUCENE_30, "English"); >>> >>> TokenStream stream = analyzer.tokenStream("title", new >>> StringReader(fullQuery.toString()): >>> CharTermAttribute termAttr = >>> stream.addAttribute(CharTermAttribute.class); >>> stream.reset(); >>> BooleanQuery q = new BooleanQuery(); >>> while(stream.incrementToken()) { >>> q.addClause(new BooleanClause(Occur.MUST, new Term("title", >>> termAttr.toString()))); >>> } >>> >>> you also have access to the token positions if you want to create >>> phrase queries etc. just add a PositionIncrementAttribute like this: >>> PositionIncrementAttribute posAttr = >>> stream.addAttribute(PositionsIncrementAttribute.class); >>> >>> pls. doublecheck the code it's straight from the top of my head. >>> >>> simon >>> >>>> >>>> Like I said, this seems to kind of work but it doesn't feel right. Does >>>> this make sense? Is there a better way? >>>> >>>> thanks in advance, >>>> >>>> Bill >>> >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org >
-- lucidimagination.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org