Hi I have the following shinglefilter (Lucene 3.2)
public TokenStream tokenStream(String fieldName, Reader reader) { StandardTokenizer first = new StandardTokenizer(Version.LUCENE_32, reader); StandardFilter second = new StandardFilter(Version.LUCENE_32,first); LowerCaseFilter third = new LowerCaseFilter(Version.LUCENE_32,second); StopFilter fourth = new StopFilter(Version.LUCENE_32,third,Stopwords); PositionFilter fifth = new PositionFilter(fourth); ShingleFilter filter = new ShingleFilter(fifth,shingleSize); return filter; } that produces the following token stream given sentence "please parse this sentence into a shingle of size 2. I'll pay $2 for it" 1: [_ parse:7->12:shingle] 2: [parse:7->12:<ALPHANUM>] [parse sentence:7->26:shingle] 3: [sentence:18->26:<ALPHANUM>] [sentence shingle:18->41:shingle] 4: [shingle:34->41:<ALPHANUM>] [shingle size:34->49:shingle] 5: [size:45->49:<ALPHANUM>] [size 2:45->51:shingle] 6: [2:50->51:<NUM>] [2 pay:50->61:shingle] 7: [pay:58->61:<ALPHANUM>] [pay 2:58->64:shingle] 8: [2:63->64:<NUM>] The query analyzer produces the following analyzed query for the field "titleShingled" for above sentence: ...... analyzed query:titleShingled:parse titleShingled:sentence titleShingled:shingle titleShingled:size titleShingled:2 titleShingled:pay titleShingled:2 As you can see there is no bigram singles in the query. I tried removing the unigrams from the token stream (using filter.setOutputUnigrams(false) in above shingles filter) but even though the singles seem to be fine the query is empty 1: [_ parse:7->12:shingle] 2: [parse sentence:7->26:shingle] 3: [sentence shingle:18->41:shingle] 4: [shingle size:34->49:shingle] 5: [size 2:45->51:shingle] 6: [2 pay:50->61:shingle] 7: [pay 2:58->64:shingle] ...... analyzed query: My goal is to index both unigrams and bigrams but first try to search on bigrams. I think it is the queryparser that is parsing the shingles in a manner that I am not understanding properly. QueryParser parser = new QueryParser(Version.LUCENE_32,"titleShingled",new ShinglesAnalyzer(2,Stopwords)); Any help would be very much appreciated Peyman