[ 
https://issues.apache.org/jira/browse/HDFS-10247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15239254#comment-15239254
 ] 

Bob Hansen commented on HDFS-10247:
-----------------------------------

Fair enough.  +1

> libhdfs++: Datanode protocol version mismatch
> ---------------------------------------------
>
>                 Key: HDFS-10247
>                 URL: https://issues.apache.org/jira/browse/HDFS-10247
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs-client
>            Reporter: James Clampffer
>            Assignee: James Clampffer
>         Attachments: HDFS-10247.HDFS-8707.000.patch, 
> HDFS-10247.HDFS-8707.001.patch, HDFS-10247.HDFS-8707.002.patch
>
>
> Occasionally "Version Mismatch (Expected: 28, Received: 22794 )" shows up in 
> the logs.  This doesn't happen much at all with less than 500 concurrent 
> reads and starts happening often enough to be an issue at 1000 concurrent 
> reads.
> I've seen 3 distinct numbers: 23050 (most common), 22538, and 22794.  If you 
> break these shorts into bytes you get
> {code}
> 23050 -> [90,10]
> 22794 -> [89,10]
> 22538 -> [88,10]
> {code}
> Interestingly enough if we dump buffers holding protobuf messages just before 
> they hit the wire we see things like the following with the first two bytes 
> as 90,10
> {code}
> buffer 
> ={90,10,82,10,64,10,52,10,37,66,80,45,49,51,56,49,48,51,51,57,57,49,45,49,50,55,46,48,46,48,46,49,45,49,52,53,57,53,50,53,54,49,53,55,50,53,16,-127,-128,-128,-128,4,24,-23,7,32,-128,-128,64,18,8,10,0,18,0,26,0,34,0,18,14,108,105,98,104,100,102,115,43,43,95,75,67,43,49,16,0,24,23,32,1}
> {code}
> The first 3 bytes the DN is expecting for an unsecured read block request = 
> {code}
> {0,28,81} //[0, 28]->a short for protocol, 81 is read block opcode
> {code}
> This seems like either connections are getting swapped between readers or
> the header isn't being sent for some reason but the protobuf message is.
> I've ruled out memory stomps on the header data (see HDFS-10241) by sticking 
> the 3 byte header in it's own static buffer that all requests use.
> Some notes:
> -The mismatched number will stay the same for the duration of a stress test.
> -The mismatch is distributed fairly evenly throughout the logs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to