Hi,

I am looking for a tokenizer, where I could specify a delimiter by which
the words are tokenized, for example if I choose the delimiters as ' ' and
'_' the following string:
"foo__bar doo"
would be tokenized into:
"foo", "", "bar", "doo"
(The analyzer could further filter empty tokens, since having the empty
string token is not critical).

Is such functionality built into Lucene (I'm working with 7.1.0) and does
this seem like the correct approach to the problem?

Regards,
Armīns

Reply via email to