[
https://issues.apache.org/jira/browse/HDFS-3704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431703#comment-13431703
]
Yanbo Liang commented on HDFS-3704:
-----------------------------------
I think we should tackle it at the following function:
{code}
private long readBlockLength(LocatedBlock locatedblock) throws IOException {
assert locatedblock != null : "LocatedBlock cannot be null";
int replicaNotFoundCount = locatedblock.getLocations().length;
for(DatanodeInfo datanode : locatedblock.getLocations()) {
ClientDatanodeProtocol cdp = null;
try {
cdp = DFSUtil.createClientDatanodeProtocolProxy(
datanode, dfsClient.conf, dfsClient.getConf().socketTimeout,
locatedblock);
final long n = cdp.getReplicaVisibleLength(locatedblock.getBlock());
if (n >= 0) {
return n;
}
}
catch(IOException ioe) {
if (ioe instanceof RemoteException &&
(((RemoteException) ioe).unwrapRemoteException() instanceof
ReplicaNotFoundException)) {
// special case : replica might not be on the DN, treat as 0 length
replicaNotFoundCount--;
}
if (DFSClient.LOG.isDebugEnabled()) {
DFSClient.LOG.debug("Failed to getReplicaVisibleLength from datanode "
+ datanode + " for block " + locatedblock.getBlock(), ioe);
}
} finally {
if (cdp != null) {
RPC.stopProxy(cdp);
}
}
}
// Namenode told us about these locations, but none know about the replica
// means that we hit the race between pipeline creation start and end.
// we require all 3 because some other exception could have happened
// on a DN that has it. we want to report that error
if (replicaNotFoundCount == 0) {
return 0;
}
throw new IOException("Cannot obtain block length for " + locatedblock);
}
{code}
This function is used for read the block length from the datanode which stored
the last block of the file. If the exception is not an instance of
RemoteException, it just throw out. So we can just add exception handler to
other IOException and add this kind of DataNodes into deadNodes. If it make
sense, I can solve it.
> In the DFSClient, Add the node to the dead list when the ipc.Client calls
> fails
> -------------------------------------------------------------------------------
>
> Key: HDFS-3704
> URL: https://issues.apache.org/jira/browse/HDFS-3704
> Project: Hadoop HDFS
> Issue Type: Improvement
> Affects Versions: 1.0.3, 2.0.0-alpha
> Reporter: nkeywal
> Priority: Minor
>
> The DFSCLient maintains a list of dead node per input steam. When creating
> this DFSInputStream, it may connect to one of the nodes to check final block
> size. If this call fail, this datanode should be put in the dead nodes list
> to save time. If not it will be retried for the block transfer during the
> read, and we're likely to get a timeout.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira