Robert Muir commented on LUCENE-8175:

Yes that is good. FWIW I plan to upgrade to the new version regardless, even if 
this test sometimes fails.

At the end of the day StandardAnalyzer is always an option for users that want 
more stability and backwards compatibility. The ICU integration is instead for 
the latest unicode capabilities. I think its ok to hold them back for a few 
months because of rare bugs, but there's a limit.

> ICUTokenizer might return corrupt tokens due to concurrency bug in ICU4J
> ------------------------------------------------------------------------
>                 Key: LUCENE-8175
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8175
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Adrien Grand
>            Priority: Critical
> I was digging some test failures with {{testRandomHugeStrings}} that occurred 
> since the upgrade to ICU4J 60.2 which happen to boil down to this bug: 
> http://bugs.icu-project.org/trac/ticket/13512 which is fixed but not released 
> yet.
> In short an int[] is shared across several threads while it shouldn't. As a 
> consequence, ICUTokenizer may sometimes return corrupt tokens. I pinged on 
> the issue to know when a release fixing this bug is expected.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to