Preprocess input text before tokenizing

Jaime Thu, 23 Jun 2016 08:48:00 -0700

Hello,

I want to change the input text before tokenizing. I think I just needto use some characters as word separators, and maybe remove some otherscompletely.

I was planning to use MappingCharFilterFactory to replace some charswith " " and others with "", but I feel like I'm not in the right track.

First, I've implemented a custom analyzer to use my custom tokenizer. Myidea was to inherit from StandardTokenizer and, in setReader, callingMappingCharFilterFactory.create(reader) from within.


However, setReader is final, so I can't override it.

Is there a better way to do this?
In any case, how should I use MappingCharFilter in case I really needed it?

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Preprocess input text before tokenizing

Reply via email to