another idea is to .setEnablePositionIncrements(false) on your stopfilter. On Wed, May 11, 2011 at 8:27 AM, Steven A Rowe <sar...@syr.edu> wrote: > Hi Bill, > > I can think of two possible interpretations of "removing filler tokens": > > 1. Don't create shingles across stopwords, e.g. for text "one two three four > five" and stopword "three", bigrams only, you'd get ("one two", "four five"), > instead of the current ("one two", "two _", "_ four", "four five"). > > 2. Create shingles as if the stopwords were never there, e.g. for the same > text and stopword, bigrams only, you'd get ("one two", "two four", "four > five"). > > Which one did you have in mind? #2 can be achieved by adding PositionFilter > after StopFilter and before ShingleFilter. I think #1 requires ShingleFilter > modifications. > > Steve > >> -----Original Message----- >> From: William Koscho [mailto:wkos...@gmail.com] >> Sent: Wednesday, May 11, 2011 12:05 AM >> To: java-user@lucene.apache.org >> Subject: Can I omit ShingleFilter's filler tokens >> >> Hi, >> >> Can I remove the filler token _ from the n-gram-tokens that are generated >> by >> a ShingleFilter? >> >> I'm using a chain of filters: ClassicFilter, StopFilter, LowerCaseFilter, >> and ShingleFilter to create phrase n-grams. The ShingleFilter inserts >> FILLER_TOKENs in place of the stopwords, but I don't want them. >> >> How can I omit the filler tokens? >> >> thanks >> Bill >
--------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org