Shad Storhaug created LUCENENET-573:
---------------------------------------

             Summary: Make IcuBreakIterator more like the JDK's 
BreakIterator.getInstance()
                 Key: LUCENENET-573
                 URL: https://issues.apache.org/jira/browse/LUCENENET-573
             Project: Lucene.Net
          Issue Type: Improvement
    Affects Versions: Lucene.Net 5.0 PCL
            Reporter: Shad Storhaug


The IcuBreakIterator is a wrapper around the icu-dotnet library. It implements 
the JDK BreakIterator business logic that was previously missing there, but has 
since been added in the form of a RuleBasedBreakIterator. IcuBreakIterator is 
utilized by Lucene.Net.Analysis.Common.Th.ThaiAnalyzer, 
Lucene.Net.Highlighter.PostingsHighlight, and 
Lucene.Net.Highlighter.VectorHighlight. While all of the tests are passing for 
these components, it is primarily because of hacks that were added as 
workarounds. In reality, the functionality of IcuBreakIterator has many 
rule-based differences that make its breaking text behavior act quite 
differently than the JDK.

We need to investigate whether the RuleBasedBreakIterator in icu-dotnet can be 
utilized as is, or if it can be improved to more closely emulate the 
BreakIterator functionality in the JDK.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to