[ 
https://issues.apache.org/jira/browse/LUCENENET-573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shad Storhaug updated LUCENENET-573:
------------------------------------
    Affects Version/s:     (was: Lucene.Net 5.0 PCL)
                       Lucene.Net 4.8.0

> Make IcuBreakIterator more like the JDK's BreakIterator.getInstance()
> ---------------------------------------------------------------------
>
>                 Key: LUCENENET-573
>                 URL: https://issues.apache.org/jira/browse/LUCENENET-573
>             Project: Lucene.Net
>          Issue Type: Improvement
>    Affects Versions: Lucene.Net 4.8.0
>            Reporter: Shad Storhaug
>
> The IcuBreakIterator is a wrapper around the icu-dotnet library. It 
> implements the JDK BreakIterator business logic that was previously missing 
> there, but has since been added in the form of a RuleBasedBreakIterator. 
> IcuBreakIterator is utilized by Lucene.Net.Analysis.Common.Th.ThaiAnalyzer, 
> Lucene.Net.Highlighter.PostingsHighlight, and 
> Lucene.Net.Highlighter.VectorHighlight. While all of the tests are passing 
> for these components, it is primarily because of hacks that were added as 
> workarounds. In reality, the functionality of IcuBreakIterator has many 
> rule-based differences that make its breaking text behavior act quite 
> differently than the JDK.
> We need to investigate whether the RuleBasedBreakIterator in icu-dotnet can 
> be utilized as is, or if it can be improved to more closely emulate the 
> BreakIterator functionality in the JDK.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to