TokenFilters eating position increments

Erik Hatcher Thu, 22 Sep 2005 13:30:53 -0700

Yonik identified an interesting issue with LUCENE-437 - http://issues.apache.org/jira/browse/LUCENE-437

I patched the SnowballFilter, but then looked at other filters and wehave the same issue with some of them (like StandardFilter,GermanStemFilter, GreekLowerCaseFilter, and others that create a newToken).

To perhaps alleviate this situation in the future, maybe we shouldadd another constructor to Token:

public Token(String text, int start, int end, String typ, intpositionIncrement)


Or maybe one that clones an existing token:

    public Token(Token template, String newText)

where all the metadata for the token (start, end, type, and positionincrement) is copied and the newText is used for the Token textinstead. Filters don't generally change offsets, type, or positionincrements anyway - the majority change the text for stemming orlowercasing purposes.


Thoughts?

    Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

TokenFilters eating position increments

Reply via email to