Shad Storhaug created LUCENENET-573:
---------------------------------------
Summary: Make IcuBreakIterator more like the JDK's
BreakIterator.getInstance()
Key: LUCENENET-573
URL: https://issues.apache.org/jira/browse/LUCENENET-573
Project: Lucene.Net
Issue Type: Improvement
Affects Versions: Lucene.Net 5.0 PCL
Reporter: Shad Storhaug
The IcuBreakIterator is a wrapper around the icu-dotnet library. It implements
the JDK BreakIterator business logic that was previously missing there, but has
since been added in the form of a RuleBasedBreakIterator. IcuBreakIterator is
utilized by Lucene.Net.Analysis.Common.Th.ThaiAnalyzer,
Lucene.Net.Highlighter.PostingsHighlight, and
Lucene.Net.Highlighter.VectorHighlight. While all of the tests are passing for
these components, it is primarily because of hacks that were added as
workarounds. In reality, the functionality of IcuBreakIterator has many
rule-based differences that make its breaking text behavior act quite
differently than the JDK.
We need to investigate whether the RuleBasedBreakIterator in icu-dotnet can be
utilized as is, or if it can be improved to more closely emulate the
BreakIterator functionality in the JDK.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)