Yes, StopFilter.setEnablePositionIncrements(false) will almost certainly get higher throughput than inserting PositionFilter. Like PositionFilter, this will buy you #2 (create shingles as if stopwords were never there), but not #1 (don't create shingles across stopwords).
> -----Original Message----- > From: Robert Muir [mailto:rcm...@gmail.com] > Sent: Wednesday, May 11, 2011 9:02 AM > To: java-user@lucene.apache.org > Subject: Re: Can I omit ShingleFilter's filler tokens > > another idea is to .setEnablePositionIncrements(false) on your > stopfilter. > > On Wed, May 11, 2011 at 8:27 AM, Steven A Rowe <sar...@syr.edu> wrote: > > Hi Bill, > > > > I can think of two possible interpretations of "removing filler > tokens": > > > > 1. Don't create shingles across stopwords, e.g. for text "one two three > four five" and stopword "three", bigrams only, you'd get ("one two", > "four five"), instead of the current ("one two", "two _", "_ four", "four > five"). > > > > 2. Create shingles as if the stopwords were never there, e.g. for the > same text and stopword, bigrams only, you'd get ("one two", "two four", > "four five"). > > > > Which one did you have in mind? #2 can be achieved by adding > PositionFilter after StopFilter and before ShingleFilter. I think #1 > requires ShingleFilter modifications. > > > > Steve > > > >> -----Original Message----- > >> From: William Koscho [mailto:wkos...@gmail.com] > >> Sent: Wednesday, May 11, 2011 12:05 AM > >> To: java-user@lucene.apache.org > >> Subject: Can I omit ShingleFilter's filler tokens > >> > >> Hi, > >> > >> Can I remove the filler token _ from the n-gram-tokens that are > generated > >> by > >> a ShingleFilter? > >> > >> I'm using a chain of filters: ClassicFilter, StopFilter, > LowerCaseFilter, > >> and ShingleFilter to create phrase n-grams. The ShingleFilter inserts > >> FILLER_TOKENs in place of the stopwords, but I don't want them. > >> > >> How can I omit the filler tokens? > >> > >> thanks > >> Bill > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org