[ 
https://issues.apache.org/jira/browse/HDFS-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15222588#comment-15222588
 ] 

Masatake Iwasaki commented on HDFS-10178:
-----------------------------------------

{code}
577             // For testing. Delay sending packet downstream
578             if (DataNodeFaultInjector.get().stopSendingPacketDownstream()) {
579               try {
580                 Thread.sleep(60000);
581               } catch (InterruptedException ie) {
582                 throw new IOException("Interrupted while sleeping. Bailing 
out.");
583               }
584             }
{code}

Should the test logic be encapsulate in the DataNodeFaultInjector's method? like

{code}
    DataNodeFaultInjector dnFaultInjector = new DataNodeFaultInjector() {
      int tries = 1;
      @Override
      public void stopSendingPacketDownstream() throws IOException {
        if (tries > 0) {
          tries--;
          try {
            Thread.sleep(60000);
          } catch (InterruptedException ie) {
            throw new IOException("Interrupted while sleeping. Bailing out.");
          }
        }
      }
    };
{code}


> Permanent write failures can happen if pipeline recoveries occur for the 
> first packet
> -------------------------------------------------------------------------------------
>
>                 Key: HDFS-10178
>                 URL: https://issues.apache.org/jira/browse/HDFS-10178
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>            Priority: Critical
>         Attachments: HDFS-10178.patch, HDFS-10178.v2.patch, 
> HDFS-10178.v3.patch, HDFS-10178.v4.patch
>
>
> We have observed that write fails permanently if the first packet doesn't go 
> through properly and pipeline recovery happens. If the packet header is sent 
> out, but the data portion of the packet does not reach one or more datanodes 
> in time, the pipeline recovery will be done against the 0-byte partial block. 
>  
> If additional datanodes are added, the block is transferred to the new nodes. 
>  After the transfer, each node will have a meta file containing the header 
> and 0-length data block file. The pipeline recovery seems to work correctly 
> up to this point, but write fails when actual data packet is resent. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to