I have been migrating and maintaining lucene indexing code for our use case since 2.x version (now we are are 6.6.1 migrating to 7.x) .

One problem I am constantly facing is regarding org.apache.lucene.analysis.miscellaneous.WordDelimiterIterator class that is defined final in lucene codebase.  In this class, there is a isBreak() method that defines when to split a word into subwords. One of the cases is *ALPHA->NUMERIC, NUMERIC->ALPHA :Don't split* (in the same if condition) .

Unfortunately, in my use case we strictly want *NUMERIC->ALPHA :Don't split* and there is no way around to change this behavior using the configurationFlags.

Since this isBreak() method is private and WordDelimiterFilterIterator class final therefore there is no possibility for subclassing and overriding this method.

Also, WordDelimiterFilterIterator is tightly coupled with WordDelimiterFilter (WordDelimiterGraphFilter in 7.x) and both are final. So this leaves me with only one option to copy paste their code into custom classes and change the behaviour. Clearly this is not a maintainable solution.

So, I am looking for advise what else is possible? OR is there a possibility of a patch/refactoring to fix isBreak() to use some new configuration flags?

- Best

Parit Bansal

(Developer www.uniprot.org)

To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to