[ 
https://issues.apache.org/jira/browse/HDFS-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15224205#comment-15224205
 ] 

Vinayakumar B commented on HDFS-10178:
--------------------------------------

Hi [~kihwal], I am trying to understand the issue. may be my understanding 
about packet sending/receiving is wrong. 
bq. If the packet header is sent out, but the data portion of the packet does 
not reach one or more datanodes in time
How Only header can be sent out without any data/datalen? header will have the 
payload length. So PacketReceiver should fail/wait to receive the entire packet 
itself right (not just receive header portion) ? 
or payload length is corrupted in the incoming packet to 0?
{code:java|title=packetReceiver.java#doRead(..)}
    // Each packet looks like:
    //   PLEN    HLEN      HEADER     CHECKSUMS  DATA
    //   32-bit  16-bit   <protobuf>  <variable length>
    //
    // PLEN:      Payload length
    //            = length(PLEN) + length(CHECKSUMS) + length(DATA)
    //            This length includes its own encoded length in
    //            the sum for historical reasons.
    //
    // HLEN:      Header length
    //            = length(HEADER)
    //
    // HEADER:    the actual packet header fields, encoded in protobuf
    // CHECKSUMS: the crcs for the data chunk. May be missing if
    //            checksums were not requested
    // DATA       the actual block data
{code}

> Permanent write failures can happen if pipeline recoveries occur for the 
> first packet
> -------------------------------------------------------------------------------------
>
>                 Key: HDFS-10178
>                 URL: https://issues.apache.org/jira/browse/HDFS-10178
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>            Priority: Critical
>         Attachments: HDFS-10178.patch, HDFS-10178.v2.patch, 
> HDFS-10178.v3.patch, HDFS-10178.v4.patch
>
>
> We have observed that write fails permanently if the first packet doesn't go 
> through properly and pipeline recovery happens. If the packet header is sent 
> out, but the data portion of the packet does not reach one or more datanodes 
> in time, the pipeline recovery will be done against the 0-byte partial block. 
>  
> If additional datanodes are added, the block is transferred to the new nodes. 
>  After the transfer, each node will have a meta file containing the header 
> and 0-length data block file. The pipeline recovery seems to work correctly 
> up to this point, but write fails when actual data packet is resent. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to