hariram, Until Lucene 6.2, there was no way for the classic query parser to *not* first split on whitespace before sending text to the analyzer. As a result, filters like ShingleFilter that operate on multiple tokens will only see one token at a time; in your example: first “cup” as the full text to analyze, and then, separately, “board” - ShingleFilter is incapable under those conditions of forming any multi-token synthetic tokens.
For more details see <https://issues.apache.org/jira/browse/LUCENE-2605>. -- Steve www.lucidworks.com > On Jul 24, 2017, at 2:00 PM, hariram ravichandran > <hariramravichan...@gmail.com> wrote: > > Hi Steve, > I'm sorry. That's also CustomAnalyzer. > > public class CustomAnalyzer extends Analyzer { >> @Override >> protected Analyzer.TokenStreamComponents createComponents(final String >> fieldName, final Reader reader) { >> final WhitespaceTokenizer src = new WhitespaceTokenizer(getVersion(), >> reader); >> TokenStream tok = new ShingleFilter(src, 2, 3); >> tok = new ClassicFilter(tok); >> tok = new LowerCaseFilter(tok); >> // tok = new SynonymFilter(tok,SynonymDictionary. >> getSynonymMap(),true); >> return new Analyzer.TokenStreamComponents(src, tok); >> } >> } >> >> > public class Test { >> public static void main(String[] args) throws Exception { >> CustomAnalyzer analyzer = new CustomAnalyzer(); >> String queryStr = "cup board"; >> TokenStream ts = new CustomAnalyzer().tokenStream("n", new >> StringReader(queryStr)); >> ts.reset(); >> System.out.println("Tokens are :"); >> while (ts.incrementToken()) { >> System.out.print(ts.getAttribute(CharTermAttribute.class) + >> ", "); >> } >> QueryParser parser = new QueryParser("n", analyzer); >> Query query = null; >> query = parser.parse(queryStr); >> System.out.println("\nQuery is"); >> System.out.print(query.toString()); >> } >> } > > > Output: >> Tokens are : >> cup, cup board, board >> Query is n >> n:cup n:board >> > > > On Mon, Jul 24, 2017 at 11:08 PM, Steve Rowe <sar...@gmail.com> wrote: > >> Hi hariram, >> >> There may be other problems, but at a minimum you have two different >> analysis classes here. You’re printing the output stream from one >> (CustomSynynymAnalyzer, the source of which is not shown in your email), >> but constructing a query from a different one (CustomAnalyzer). >> >> -- >> Steve >> www.lucidworks.com >> >>> On Jul 24, 2017, at 10:53 AM, hariram ravichandran < >> hariramravichan...@gmail.com> wrote: >>> >>> I'm using Lucene 4.10.4 and trying to construct (shingles) combinations >> of >>> tokens. >>> >>> >>> Code: >>> >>> public class CustomAnalyzer extends Analyzer { >>> @Override >>> protected Analyzer.TokenStreamComponents createComponents(final String >>> fieldName, final Reader reader) { >>> final WhitespaceTokenizer src = new >>> WhitespaceTokenizer(getVersion(), reader); >>> TokenStream tok = new ShingleFilter(src, 2, 3); >>> tok = new ClassicFilter(tok); >>> tok = new LowerCaseFilter(tok); >>> // tok = new >>> SynonymFilter(tok,SynonymDictionary.getSynonymMap(),true); >>> return new Analyzer.TokenStreamComponents(src, tok); >>> } >>> } >>> >>> public class Test { >>> public static void main(String[] args) throws Exception { >>> CustomSynonymAnalyzer analyzer = new CustomSynonymAnalyzer(); >>> String queryStr = "cup board"; >>> TokenStream ts = new CustomAnalyzer().tokenStream("n", new >>> StringReader(queryStr)); >>> ts.reset(); >>> System.out.println("Tokens are :"); >>> while (ts.incrementToken()) { >>> System.out.print(ts.getAttribute(CharTermAttribute.class) + >> ", >>> "); >>> } >>> QueryParser parser = new QueryParser("n", analyzer); >>> Query query = null; >>> query = parser.parse(queryStr); >>> System.out.println("\nQuery is"); >>> System.out.print(query.toString()); >>> } >>> } >>> >>> >>> >>>> Output: >>>> Tokens are : >>>> cup, cup board, board >>>> Query is n >>>> n:cup n:board >>>> >>> >>> Tokens are printed as expected. And expecting the resulting query to be >> *n:cup >>> n:board n:cup board*. But tokens formed by shingle filter are not >> appended >>> to the query. I get only *n:cup n:board.* Where is my mistake? >>> >>> Thanks. >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org