[
https://issues.apache.org/jira/browse/LUCENENET-573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shad Storhaug closed LUCENENET-573.
-----------------------------------
Resolution: Won't Fix
Rather than trying to patch the ICUÂ {{BreakIterator}} to match the JDK, a more
logical default behavior is to embrace the default supplied by ICU. ICU
provides the means for the end user to supply custom rules, so we shouldn't
worry about the fact that Lucene's tests don't all pass based on this behavior,
but just provide confirmation that we can override the default as well as
confirmation that our ICU4N {{BreakIterator}} matches the behavior of ICU4J.
Java tests were created based on the ICU {{BreakIterator}}'s default behavior,
and then ported back to C# to confirm they match. A mock {{JdkBreakIterator}}
with custom rules was also created to stand in for the ICU4N {{BreakIterator}}
to confirm we can change ICU4N to match JDK's behavior.
> Make IcuBreakIterator more like the JDK's BreakIterator.getInstance()
> ---------------------------------------------------------------------
>
> Key: LUCENENET-573
> URL: https://issues.apache.org/jira/browse/LUCENENET-573
> Project: Lucene.Net
> Issue Type: Improvement
> Components: Lucene.Net.ICU
> Affects Versions: Lucene.Net 4.8.0
> Reporter: Shad Storhaug
> Priority: Major
>
> The IcuBreakIterator is a wrapper around the icu-dotnet library. It
> implements the JDK BreakIterator business logic that was previously missing
> there, but has since been added in the form of a RuleBasedBreakIterator.
> IcuBreakIterator is utilized by Lucene.Net.Analysis.Common.Th.ThaiAnalyzer,
> Lucene.Net.Highlighter.PostingsHighlight, and
> Lucene.Net.Highlighter.VectorHighlight. While all of the tests are passing
> for these components, it is primarily because of hacks that were added as
> workarounds. In reality, the functionality of IcuBreakIterator has many
> rule-based differences that make its breaking text behavior act quite
> differently than the JDK.
> We need to investigate whether the RuleBasedBreakIterator in icu-dotnet can
> be utilized as is, or if it can be improved to more closely emulate the
> BreakIterator functionality in the JDK.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)