Github user NightOwl888 commented on the issue: https://github.com/apache/lucenenet/pull/191 > Another method to fix the points above is to use a RuleBasedBreakIterator and modify the default rules for creating a break iterator. I would have to add a native method to icu-dotnet to call to ubrk_openRules to let you create a BreakIterator. Would that work for Lucene.NET? Actually, that is exactly what the JDK does, and that explains why it differs from icu-dotnet. - [RuleBasedBreakIterator](http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8-b132/java/text/RuleBasedCollator.java#RuleBasedCollator) - [BreakIteratorRules](http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8-b132/sun/text/resources/BreakIteratorRules.java/) So yes, it would appear that will resolve the issue. That said, it is unclear why there is a RuleBasedBreakIterator both in the JDK and in icu4j and what (if any) difference there is between them. In the case of [Highlighter, Lucene uses the one in the JDK](https://github.com/apache/lucene-solr/blob/releases/lucene-solr/4.8.0/lucene/highlighter/src/java/org/apache/lucene/search/postingshighlight/PostingsHighlighter.java#L21), but in the case of [Analysis.ICU, it is using icu4j](https://github.com/apache/lucene-solr/blob/releases/lucene-solr/4.8.0/lucene/analysis/icu/src/java/org/apache/lucene/analysis/icu/segmentation/BreakIteratorWrapper.java#L24). Do we need 2 RuleBasedBreakIterators to do everything or will one suffice? Also, should we port the one from the JDK, or is there some other way to get this done? > I agree that it should be an abstract class and have more functionality (ie. moving backwards and forwards) similar to its Java counterpart. I'll see about writing a PR and submitting it to sillsdev/icu-dotnet to see if they will accept this feature. In that case, let me clean up the code and submit a PR to you, since I have already ported `BreakIterator`, `CharacterIterator`, `StringCharacterIterator`, and have made some tests that can be used to test a `RuleBasedBreakIterator` to verify it works like the one in the JDK. We could use some more tests to be more thorough, though.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---