[ 
https://issues.apache.org/jira/browse/HADOOP-13192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16276229#comment-16276229
 ] 

David Jou commented on HADOOP-13192:
------------------------------------

I wanna to report test case to show multibyte delimiter between buffers still 
incorrect. If the ambiguous characters is longer than one, the match processing 
will only do once and send all ambiguous characters as data when not matched.

        Delimiter = "***|";
        String CurrentBufferTailToken
                = "***|data***";
        String NextBufferHeadToken
                = "*|";
   

> org.apache.hadoop.util.LineReader cannot handle multibyte delimiters correctly
> ------------------------------------------------------------------------------
>
>                 Key: HADOOP-13192
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13192
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: util
>    Affects Versions: 2.6.2
>            Reporter: binde
>            Assignee: binde
>            Priority: Critical
>             Fix For: 2.8.0, 2.7.3, 2.6.5, 3.0.0-alpha1
>
>         Attachments: 
> 0001-HADOOP-13192-org.apache.hadoop.util.LineReader-match.patch, 
> 0002-fix-bug-hadoop-1392-add-test-case-for-LineReader.patch, 
> HADOOP-13192.final.patch
>
>   Original Estimate: 5m
>  Remaining Estimate: 5m
>
> org.apache.hadoop.util.LineReader.readCustomLine()  has a bug,
> when line is   aaaabccc, recordDelimiter is aaab, the result should be a,ccc,
> show the code on line 310:
>       for (; bufferPosn < bufferLength; ++bufferPosn) {
>         if (buffer[bufferPosn] == recordDelimiterBytes[delPosn]) {
>           delPosn++;
>           if (delPosn >= recordDelimiterBytes.length) {
>             bufferPosn++;
>             break;
>           }
>         } else if (delPosn != 0) {
>           bufferPosn--;
>           delPosn = 0;
>         }
>       }
> shoud be :
>       for (; bufferPosn < bufferLength; ++bufferPosn) {
>         if (buffer[bufferPosn] == recordDelimiterBytes[delPosn]) {
>           delPosn++;
>           if (delPosn >= recordDelimiterBytes.length) {
>             bufferPosn++;
>             break;
>           }
>         } else if (delPosn != 0) {
>          // ------------- change here ------------- start ----
>           bufferPosn -= delPosn;
>          // ------------- change here ------------- end ----
>   
>           delPosn = 0;
>         }
>       }



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to