[
https://issues.apache.org/jira/browse/HDFS-3398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13272598#comment-13272598
]
Uma Maheswara Rao G commented on HDFS-3398:
-------------------------------------------
Seems to be a good catch Brahma.
@Todd, It looks to be problem to me Todd. When writing on to socket if other
peer goes down, it may treat that as client error and client will exit.
How about catching socket operations and setting errorIndex to 1 (treating
first node as bad)?
I did not see the below check in 205 code.
{code}
if (errorIndex == -1) { // not a datanode error
streamerClosed = true;
}
{code}
205 code on throwable:
{code}
} catch (Throwable e) {
LOG.warn("DataStreamer Exception: " +
StringUtils.stringifyException(e));
if (e instanceof IOException) {
setLastException((IOException)e);
}
hasError = true;
}
}
{code}
In trunk:
{code}
} catch (Throwable e) {
DFSClient.LOG.warn("DataStreamer Exception", e);
if (e instanceof IOException) {
setLastException((IOException)e);
}
hasError = true;
if (errorIndex == -1) { // not a datanode error
streamerClosed = true;
}
}
{code}
> Client will not retry when primaryDN is down once it's just got pipeline
> ------------------------------------------------------------------------
>
> Key: HDFS-3398
> URL: https://issues.apache.org/jira/browse/HDFS-3398
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: hdfs client
> Affects Versions: 2.0.0
> Reporter: Brahma Reddy Battula
> Priority: Minor
>
> Scenario:
> =========
> Start NN and three DN"S
> Get the datanode to which blocks has to be replicated.
> from
> {code}
> nodes = nextBlockOutputStream(src);
> {code}
> Before start writing to the DN ,kill the primary DN.
> {code}
> // write out data to remote datanode
> blockStream.write(buf.array(), buf.position(), buf.remaining());
> blockStream.flush();
> {code}
> Now write will fail with the exception
> {noformat}
> 2012-05-10 14:21:47,993 WARN hdfs.DFSClient (DFSOutputStream.java:run(552))
> - DataStreamer Exception
> java.io.IOException: An established connection was aborted by the software in
> your host machine
> at sun.nio.ch.SocketDispatcher.write0(Native Method)
> at sun.nio.ch.SocketDispatcher.write(Unknown Source)
> at sun.nio.ch.IOUtil.writeFromNativeBuffer(Unknown Source)
> at sun.nio.ch.IOUtil.write(Unknown Source)
> at sun.nio.ch.SocketChannelImpl.write(Unknown Source)
> at
> org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:60)
> at
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
> at
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:151)
> at
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:112)
> at java.io.BufferedOutputStream.write(Unknown Source)
> at java.io.DataOutputStream.write(Unknown Source)
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:513)
> {noformat}
> .
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira