Here is how I did it (the code is from memory so it might not be correct
100%):
private boolean hasAccents;
private Token filteredToken;

public final Token next() throws IOException {
  if (hasAccents) {
    hasAccents = false;
    return filteredToken;
  }
  Token t = input.next();
  String filteredText = removeAccents(t.termText());
  if (filteredText.equals(t.termText()) { //no accents
    return t;
  } else {
    filteredToken = (Token) t.clone();
    filteredToken.setTermText(filteredText):
    filteredToken.setPositionIncrement(0);
    hasAccents = true;
  }
  return t;
}

On Sat, Jun 21, 2008 at 2:37 AM, Phillip Farber <[EMAIL PROTECTED]> wrote:

> Regarding indexing words with accented and unaccented characters with
> positionIncrement zero:
>
> Chris Hostetter wrote:
>
>>
>> you don't really need a custom tokenizer -- just a buffered TokenFilter
>> that clones the original token if it contains accent chars, mutates the
>> clone, and then emits it next with a positionIncrement of 0.
>>
>>
> Could someone expand on how to implement this technique of buffering and
> cloning?
>
> Thanks,
>
> Phil
>



-- 
Regards,

Cuong Hoang

Reply via email to