Here is how I did it (the code is from memory so it might not be correct 100%): private boolean hasAccents; private Token filteredToken;
public final Token next() throws IOException { if (hasAccents) { hasAccents = false; return filteredToken; } Token t = input.next(); String filteredText = removeAccents(t.termText()); if (filteredText.equals(t.termText()) { //no accents return t; } else { filteredToken = (Token) t.clone(); filteredToken.setTermText(filteredText): filteredToken.setPositionIncrement(0); hasAccents = true; } return t; } On Sat, Jun 21, 2008 at 2:37 AM, Phillip Farber <[EMAIL PROTECTED]> wrote: > Regarding indexing words with accented and unaccented characters with > positionIncrement zero: > > Chris Hostetter wrote: > >> >> you don't really need a custom tokenizer -- just a buffered TokenFilter >> that clones the original token if it contains accent chars, mutates the >> clone, and then emits it next with a positionIncrement of 0. >> >> > Could someone expand on how to implement this technique of buffering and > cloning? > > Thanks, > > Phil > -- Regards, Cuong Hoang