This is truly a bug. The outputUnigram internally only works if you request bi-grams. If the outputUnigram is set to false the filter increment the shingleposition by one and therefore skips every even shingle. The position should only be incremented if shingleBufferPosition % maxShingle == 0
I have a test and the fix - will open an issue soon. simon On Fri, Jan 8, 2010 at 7:48 PM, Chris Hostetter <hossman_luc...@fucit.org> wrote: > > : I am using lucene 2.9.1 and I was trying to understand the ShingleFilter > and wrote the code below. > ... > : I was expecting the output as follows with maxShingleSize=3 and > outputUnigrams=false : > ... > : Am I missing something or this is the expected behavior? > > I'm not very familiar with ShingleFilter, and i'n not 100% sure i > understand the example you describe, but it *seems* like there may be a > bug here ... the easieest way to verify that is if you could tweak your > example code into the form of a (failing) JUnit test and open a new Jira > issue -- then other devs (who know more about SHingleFilter) could look at > it and either verify that there is a bug, or point out what's invalid > about hte test. > > > > -Hoss > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org