[jira] [Commented] (HDFS-10247) libhdfs++: Datanode protocol version mismatch

Hudson (JIRA) Thu, 22 Mar 2018 14:58:53 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-10247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16410420#comment-16410420
 ]


Hudson commented on HDFS-10247:
-------------------------------

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13869 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/13869/])
HDFS-10247: libhdfs++: Datanode protocol version mismatch fix.  
(james.clampffer: rev 60c3437267b864a01d27783535906c6a7e81058e)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/reader/block_reader.cc
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/common/continuation/asio.h
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/common/continuation/protobuf.h
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/common/util.h
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/common/util.cc
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/reader/datatransfer_impl.h


> libhdfs++: Datanode protocol version mismatch
> ---------------------------------------------
>
>                 Key: HDFS-10247
>                 URL: https://issues.apache.org/jira/browse/HDFS-10247
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs-client
>            Reporter: James Clampffer
>            Assignee: James Clampffer
>            Priority: Major
>         Attachments: HDFS-10247.HDFS-8707.000.patch, 
> HDFS-10247.HDFS-8707.001.patch, HDFS-10247.HDFS-8707.002.patch
>
>
> Occasionally "Version Mismatch (Expected: 28, Received: 22794 )" shows up in 
> the logs.  This doesn't happen much at all with less than 500 concurrent 
> reads and starts happening often enough to be an issue at 1000 concurrent 
> reads.
> I've seen 3 distinct numbers: 23050 (most common), 22538, and 22794.  If you 
> break these shorts into bytes you get
> {code}
> 23050 -> [90,10]
> 22794 -> [89,10]
> 22538 -> [88,10]
> {code}
> Interestingly enough if we dump buffers holding protobuf messages just before 
> they hit the wire we see things like the following with the first two bytes 
> as 90,10
> {code}
> buffer 
> ={90,10,82,10,64,10,52,10,37,66,80,45,49,51,56,49,48,51,51,57,57,49,45,49,50,55,46,48,46,48,46,49,45,49,52,53,57,53,50,53,54,49,53,55,50,53,16,-127,-128,-128,-128,4,24,-23,7,32,-128,-128,64,18,8,10,0,18,0,26,0,34,0,18,14,108,105,98,104,100,102,115,43,43,95,75,67,43,49,16,0,24,23,32,1}
> {code}
> The first 3 bytes the DN is expecting for an unsecured read block request = 
> {code}
> {0,28,81} //[0, 28]->a short for protocol, 81 is read block opcode
> {code}
> This seems like either connections are getting swapped between readers or
> the header isn't being sent for some reason but the protobuf message is.
> I've ruled out memory stomps on the header data (see HDFS-10241) by sticking 
> the 3 byte header in it's own static buffer that all requests use.
> Some notes:
> -The mismatched number will stay the same for the duration of a stress test.
> -The mismatch is distributed fairly evenly throughout the logs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-10247) libhdfs++: Datanode protocol version mismatch

Reply via email to