[jira] [Commented] (LUCENE-5076) Incorrect behavior for TestLaoBreakIterator.isWord()

Robert Muir (JIRA) Mon, 24 Jun 2013 09:22:25 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-5076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13692107#comment-13692107
 ]


Robert Muir commented on LUCENE-5076:
-------------------------------------

Thanks for reporting this: I think its a bug.

                
> Incorrect behavior for TestLaoBreakIterator.isWord()
> ----------------------------------------------------
>
>                 Key: LUCENE-5076
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5076
>             Project: Lucene - Core
>          Issue Type: Bug
>    Affects Versions: 4.3.1
>         Environment: any
>            Reporter: Adrian Nistor
>
> The incorrect behavior appears in version 4.3.1 and in revision
> 1496055.
> Method "TestLaoBreakIterator.isWord" contains this loop:
> {code:java|borderStyle=solid}
> for (int i = start; i < end; i += UTF16.getCharCount(codepoint)) {
>     codepoint = UTF16.charAt(text, 0, end, start);
>     if (UCharacter.isLetterOrDigit(codepoint))
>         return true;
> }
> {code}
> It appears that the code is reading only one character again and
> again, irrespective of "i".  This looks incorrect.  I think the code
> inside the loop should use "i", e.g., read characters based on "i".
> If the intended behavior is to read only one character, then the loop
> should not be necessary.
> A similar problem appears in method
> "BreakIteratorWrapper.BIWrapper.calcStatus" for this loop:
> {code:java|borderStyle=solid}
> for (int i = begin; i < end; i += UTF16.getCharCount(codepoint)) {
>     codepoint = UTF16.charAt(text, 0, end, begin);
>     if (UCharacter.isDigit(codepoint))
>         return RuleBasedBreakIterator.WORD_NUMBER;
>     else if (UCharacter.isLetter(codepoint)) {
>         // TODO: try to separately specify ideographic, kana? 
>         // [currently all bundled as letter for this case]
>         return RuleBasedBreakIterator.WORD_LETTER;
>     }
> }
> {code}
> Again, the computation inside the loop does not use "i", which seems
> incorrect.  It appears that the code is reading only one character
> again and again, irrespective of "i".

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-5076) Incorrect behavior for TestLaoBreakIterator.isWord()

Reply via email to