> Hello Uwe, > > Thank you for the reply. I see that there is a version check for the > use of setEnablePositionIncrements(false); and, I think I may be able > to use an earlier api with the eXist-db embedding of Lucene 4.4 to > avoid the version check.
Hi, you don't need an older version of the Lucene library. It is enough to pass the constant, also with Lucene 4.7 or 4.8 (release in a moment): sf = new StopFilter(Version.LUCENE_43, ...); sf. setEnablePositionIncrements (false); The version constant is exactly to use some components that changed in an incompatible way still in later versions, and preserve index/behavior compatibility. About stop words: What you are doing, is not really "stop words". The main reason for stop words is the following: - Stop words are in almost every document, so it makes no sense to query for them. - The only relevant information behind the stop word is "there was a word at this position that" If the second item would not be taken care, this information would get lost, too. If every document really contains a specific stop word (which is almost always the case), there must be no difference between a phrase query with mentioned stop word, using an index with all stop words indexed and one with stop words left out. This can only be done, if the stop word reserves a position. What you intend to do is not a "stopword" use case. You want to "ignore" some words - Lucene has no support for this, because in native language processing this makes no sense. One way to do this is to: a) write your own TokenFilter, violating the TokenStream contracts b) use the Backwards compatibility layer with matchVersion=LUCENE_43 c) maybe remove the words before tokenizing (e.g. MappingCharFilter, mapping the "ignore words" to empty string) Uwe --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org