How to customize the delimiters used by the WordDelimiterFilter in Lucene?

phauly Fri, 17 Mar 2017 13:06:06 -0700

Hi,

I am trying to index words like 'e-mail' as 'email', 'e mail' and 'e-mail' with 
Lucene 4.4.0.


Lucene's WordDelimiterFilter should be ideal for this. However, it treats 
every(?) non-alphanumeric character as a delimiter. So, terms like 'C++' are 
transformed to 'C', which is not what I want.

Apparently, Solr allows to specify custom delimiters. But how can I do it in 
Lucene?

I have looked into the documentation and the 'byte[] charTypeTable' parameter 
in the Constructor looked promising. But it seems to have no effect if I 
specify some delimiters in a charTypeTable.

Thank you!

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

How to customize the delimiters used by the WordDelimiterFilter in Lucene?

Reply via email to