[
https://issues.apache.org/jira/browse/LUCENENET-573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shad Storhaug updated LUCENENET-573:
------------------------------------
Component/s: Lucene.Net.ICU
> Make IcuBreakIterator more like the JDK's BreakIterator.getInstance()
> ---------------------------------------------------------------------
>
> Key: LUCENENET-573
> URL: https://issues.apache.org/jira/browse/LUCENENET-573
> Project: Lucene.Net
> Issue Type: Improvement
> Components: Lucene.Net.ICU
> Affects Versions: Lucene.Net 4.8.0
> Reporter: Shad Storhaug
>
> The IcuBreakIterator is a wrapper around the icu-dotnet library. It
> implements the JDK BreakIterator business logic that was previously missing
> there, but has since been added in the form of a RuleBasedBreakIterator.
> IcuBreakIterator is utilized by Lucene.Net.Analysis.Common.Th.ThaiAnalyzer,
> Lucene.Net.Highlighter.PostingsHighlight, and
> Lucene.Net.Highlighter.VectorHighlight. While all of the tests are passing
> for these components, it is primarily because of hacks that were added as
> workarounds. In reality, the functionality of IcuBreakIterator has many
> rule-based differences that make its breaking text behavior act quite
> differently than the JDK.
> We need to investigate whether the RuleBasedBreakIterator in icu-dotnet can
> be utilized as is, or if it can be improved to more closely emulate the
> BreakIterator functionality in the JDK.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)