[
https://issues.apache.org/jira/browse/HDFS-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15222588#comment-15222588
]
Masatake Iwasaki commented on HDFS-10178:
-----------------------------------------
{code}
577 // For testing. Delay sending packet downstream
578 if (DataNodeFaultInjector.get().stopSendingPacketDownstream()) {
579 try {
580 Thread.sleep(60000);
581 } catch (InterruptedException ie) {
582 throw new IOException("Interrupted while sleeping. Bailing
out.");
583 }
584 }
{code}
Should the test logic be encapsulate in the DataNodeFaultInjector's method? like
{code}
DataNodeFaultInjector dnFaultInjector = new DataNodeFaultInjector() {
int tries = 1;
@Override
public void stopSendingPacketDownstream() throws IOException {
if (tries > 0) {
tries--;
try {
Thread.sleep(60000);
} catch (InterruptedException ie) {
throw new IOException("Interrupted while sleeping. Bailing out.");
}
}
}
};
{code}
> Permanent write failures can happen if pipeline recoveries occur for the
> first packet
> -------------------------------------------------------------------------------------
>
> Key: HDFS-10178
> URL: https://issues.apache.org/jira/browse/HDFS-10178
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Kihwal Lee
> Assignee: Kihwal Lee
> Priority: Critical
> Attachments: HDFS-10178.patch, HDFS-10178.v2.patch,
> HDFS-10178.v3.patch, HDFS-10178.v4.patch
>
>
> We have observed that write fails permanently if the first packet doesn't go
> through properly and pipeline recovery happens. If the packet header is sent
> out, but the data portion of the packet does not reach one or more datanodes
> in time, the pipeline recovery will be done against the 0-byte partial block.
>
> If additional datanodes are added, the block is transferred to the new nodes.
> After the transfer, each node will have a meta file containing the header
> and 0-length data block file. The pipeline recovery seems to work correctly
> up to this point, but write fails when actual data packet is resent.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)