[
https://issues.apache.org/jira/browse/LUCENE-5076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692107#comment-13692107
]
Robert Muir commented on LUCENE-5076:
-------------------------------------
Thanks for reporting this: I think its a bug.
> Incorrect behavior for TestLaoBreakIterator.isWord()
> ----------------------------------------------------
>
> Key: LUCENE-5076
> URL: https://issues.apache.org/jira/browse/LUCENE-5076
> Project: Lucene - Core
> Issue Type: Bug
> Affects Versions: 4.3.1
> Environment: any
> Reporter: Adrian Nistor
>
> The incorrect behavior appears in version 4.3.1 and in revision
> 1496055.
> Method "TestLaoBreakIterator.isWord" contains this loop:
> {code:java|borderStyle=solid}
> for (int i = start; i < end; i += UTF16.getCharCount(codepoint)) {
> codepoint = UTF16.charAt(text, 0, end, start);
> if (UCharacter.isLetterOrDigit(codepoint))
> return true;
> }
> {code}
> It appears that the code is reading only one character again and
> again, irrespective of "i". This looks incorrect. I think the code
> inside the loop should use "i", e.g., read characters based on "i".
> If the intended behavior is to read only one character, then the loop
> should not be necessary.
> A similar problem appears in method
> "BreakIteratorWrapper.BIWrapper.calcStatus" for this loop:
> {code:java|borderStyle=solid}
> for (int i = begin; i < end; i += UTF16.getCharCount(codepoint)) {
> codepoint = UTF16.charAt(text, 0, end, begin);
> if (UCharacter.isDigit(codepoint))
> return RuleBasedBreakIterator.WORD_NUMBER;
> else if (UCharacter.isLetter(codepoint)) {
> // TODO: try to separately specify ideographic, kana?
> // [currently all bundled as letter for this case]
> return RuleBasedBreakIterator.WORD_LETTER;
> }
> }
> {code}
> Again, the computation inside the loop does not use "i", which seems
> incorrect. It appears that the code is reading only one character
> again and again, irrespective of "i".
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]