[jira] [Updated] (HDFS-11749) Ongoing file write fails when its pipeline DataNode is pulled out for maintenance

Manoj Govindassamy (JIRA) Wed, 03 May 2017 15:12:44 -0700

     [ 
https://issues.apache.org/jira/browse/HDFS-11749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Manoj Govindassamy updated HDFS-11749:
--------------------------------------
    Attachment: HDFS-11749-test.01.patch

Attaching the test case to show ongoing file write fail because of maintenance 
state. 

[~mingma], [~dilaver], please share your thoughts in this use case and the 
expectation.

> Ongoing file write fails when its pipeline DataNode is pulled out for 
> maintenance
> ---------------------------------------------------------------------------------
>
>                 Key: HDFS-11749
>                 URL: https://issues.apache.org/jira/browse/HDFS-11749
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Manoj Govindassamy
>            Assignee: Manoj Govindassamy
>         Attachments: HDFS-11749-test.01.patch
>
>
> HDFS Maintenance State HDFS-7877 is suppose to put DataNodes first to 
> ENTERING_MAINTENANCE state and when all blocks are sufficiently replicated, 
> DNs transition to IN_MAINTENANCE state. Also, the UNDER_CONSTRUCTION files 
> and any ongoing writes to these files should not fail by the maintenance 
> state transition. But, in few runs I have seen the ongoing writes to open 
> files fail as its pipeline DNs are pulled out via Maintenance State feature. 
> Test case is attached.
> {code}
> java.io.IOException: Failed to replace a bad datanode on the existing 
> pipeline due to no more good datanodes being available to try. (Nodes: 
> current=[DatanodeInfoWithStorage[127.0.0.1:49306,DS-eeca7153-fba2-4f2e-a044-0a292fc6dc6d,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:49302,DS-a5adf33c-81d0-413b-879c-9c4d9acbb72a,DISK]],
>  
> original=[DatanodeInfoWithStorage[127.0.0.1:49306,DS-eeca7153-fba2-4f2e-a044-0a292fc6dc6d,DISK],
>  
> DatanodeInfoWithStorage[127.0.0.1:49302,DS-a5adf33c-81d0-413b-879c-9c4d9acbb72a,DISK]]).
>  The current failed datanode replacement policy is DEFAULT, and a client may 
> configure this via 
> 'dfs.client.block.write.replace-datanode-on-failure.policy' in its 
> configuration.
>       at 
> org.apache.hadoop.hdfs.DataStreamer.findNewDatanode(DataStreamer.java:1299)
>       at 
> org.apache.hadoop.hdfs.DataStreamer.addDatanode2ExistingPipeline(DataStreamer.java:1365)
>       at 
> org.apache.hadoop.hdfs.DataStreamer.handleDatanodeReplacement(DataStreamer.java:1545)
>       at 
> org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1460)
>       at 
> org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1443)
>       at 
> org.apache.hadoop.hdfs.DataStreamer.processDatanodeOrExternalError(DataStreamer.java:1251)
>       at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:668)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HDFS-11749) Ongoing file write fails when its pipeline DataNode is pulled out for maintenance

Reply via email to