#1 is what I'm trying for, so Ill give setPositionIncrements(false) a try. Thanks for everyone's help.
Bill On 5/11/11, Steven A Rowe <sar...@syr.edu> wrote: > Yes, StopFilter.setEnablePositionIncrements(false) will almost certainly get > higher throughput than inserting PositionFilter. Like PositionFilter, this > will buy you #2 (create shingles as if stopwords were never there), but not > #1 (don't create shingles across stopwords). > >> -----Original Message----- >> From: Robert Muir [mailto:rcm...@gmail.com] >> Sent: Wednesday, May 11, 2011 9:02 AM >> To: java-user@lucene.apache.org >> Subject: Re: Can I omit ShingleFilter's filler tokens >> >> another idea is to .setEnablePositionIncrements(false) on your >> stopfilter. >> >> On Wed, May 11, 2011 at 8:27 AM, Steven A Rowe <sar...@syr.edu> wrote: >> > Hi Bill, >> > >> > I can think of two possible interpretations of "removing filler >> tokens": >> > >> > 1. Don't create shingles across stopwords, e.g. for text "one two three >> four five" and stopword "three", bigrams only, you'd get ("one two", >> "four five"), instead of the current ("one two", "two _", "_ four", "four >> five"). >> > >> > 2. Create shingles as if the stopwords were never there, e.g. for the >> same text and stopword, bigrams only, you'd get ("one two", "two four", >> "four five"). >> > >> > Which one did you have in mind? #2 can be achieved by adding >> PositionFilter after StopFilter and before ShingleFilter. I think #1 >> requires ShingleFilter modifications. >> > >> > Steve >> > >> >> -----Original Message----- >> >> From: William Koscho [mailto:wkos...@gmail.com] >> >> Sent: Wednesday, May 11, 2011 12:05 AM >> >> To: java-user@lucene.apache.org >> >> Subject: Can I omit ShingleFilter's filler tokens >> >> >> >> Hi, >> >> >> >> Can I remove the filler token _ from the n-gram-tokens that are >> generated >> >> by >> >> a ShingleFilter? >> >> >> >> I'm using a chain of filters: ClassicFilter, StopFilter, >> LowerCaseFilter, >> >> and ShingleFilter to create phrase n-grams. The ShingleFilter inserts >> >> FILLER_TOKENs in place of the stopwords, but I don't want them. >> >> >> >> How can I omit the filler tokens? >> >> >> >> thanks >> >> Bill >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Sent from my mobile device --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org