Hi,

I am trying to index words like 'e-mail' as 'email', 'e mail' and 'e-mail' with 
Lucene 4.4.0.

Lucene's WordDelimiterFilter should be ideal for this. However, it treats 
every(?) non-alphanumeric character as a delimiter. So, terms like 'C++' are 
transformed to 'C', which is not what I want.

Apparently, Solr allows to specify custom delimiters. But how can I do it in 
Lucene?

I have looked into the documentation and the 'byte[] charTypeTable' parameter 
in the Constructor looked promising. But it seems to have no effect if I 
specify some delimiters in a charTypeTable.

Thank you!

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to