Yonik identified an interesting issue with LUCENE-437 - http://
issues.apache.org/jira/browse/LUCENE-437
I patched the SnowballFilter, but then looked at other filters and we
have the same issue with some of them (like StandardFilter,
GermanStemFilter, GreekLowerCaseFilter, and others that create a new
Token).
To perhaps alleviate this situation in the future, maybe we should
add another constructor to Token:
public Token(String text, int start, int end, String typ, int
positionIncrement)
Or maybe one that clones an existing token:
public Token(Token template, String newText)
where all the metadata for the token (start, end, type, and position
increment) is copied and the newText is used for the Token text
instead. Filters don't generally change offsets, type, or position
increments anyway - the majority change the text for stemming or
lowercasing purposes.
Thoughts?
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]