Chris Hostetter wrote:
you don't really need a custom tokenizer -- just a buffered TokenFilter that clones the original token if it contains accent chars, mutates the clone, and then emits it next with a positionIncrement of 0.
Could someone expand on how to implement this technique of buffering and cloning?
Thanks, Phil