Hi, I am looking for a tokenizer, where I could specify a delimiter by which the words are tokenized, for example if I choose the delimiters as ' ' and '_' the following string: "foo__bar doo" would be tokenized into: "foo", "", "bar", "doo" (The analyzer could further filter empty tokens, since having the empty string token is not critical).
Is such functionality built into Lucene (I'm working with 7.1.0) and does this seem like the correct approach to the problem? Regards, Armīns