Looking For Tokenizer With Custom Delimeter

Armins Stepanjans Mon, 08 Jan 2018 02:31:04 -0800

Hi,

I am looking for a tokenizer, where I could specify a delimiter by which
the words are tokenized, for example if I choose the delimiters as ' ' and
'_' the following string:
"foo__bar doo"
would be tokenized into:
"foo", "", "bar", "doo"
(The analyzer could further filter empty tokens, since having the empty
string token is not critical).


Is such functionality built into Lucene (I'm working with 7.1.0) and does
this seem like the correct approach to the problem?

Regards,
Armīns

Looking For Tokenizer With Custom Delimeter

Reply via email to