Regarding indexing words with accented and unaccented characters with positionIncrement zero:

Chris Hostetter wrote:

you don't really need a custom tokenizer -- just a buffered TokenFilter that clones the original token if it contains accent chars, mutates the clone, and then emits it next with a positionIncrement of 0.


Could someone expand on how to implement this technique of buffering and cloning?

Thanks,

Phil

Reply via email to