[ https://issues.apache.org/jira/browse/HDFS-14774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16919011#comment-16919011 ]
CR Hota edited comment on HDFS-14774 at 8/29/19 10:32 PM: ---------------------------------------------------------- [~jojochuang] Thanks for reporting this. This is ok at this point. Reason being, router has 2 layers, One the server to external clients and client to downstream namenodes. Client to downstream namenodes (aka RouterRpcClient) is configured to retry multiple times based on failures from downstream namenode. It also has logic to failover and try standby namenode if standby becomes active etc. So ya retries are present before dns comes back as null. And if it does come back as null then parent method sends back an appropriate IOexception. {code:java} if (dn == null) { throw new IOException("Failed to find datanode, suggest to check cluster" + " health. excludeDatanodes=" + excludeDatanodes); } {code} Let me know if this helps ? was (Author: crh): [~jojochuang] Thanks for reporting this. This is ok at this point. Reason being, router has 2 layers, One the server to external clients and client to downstream namenodes. Client to downstream namenodes (aka RouterRpcClient) is configured to retry multiple times based on failures from downstream namenode. It also has logic to failover and try standby namenode if standby becomes active etc. So ya retries are present before dns comes back as null. And if it does come back as null then parent method does send back an appropriate IOexception. {code:java} if (dn == null) { throw new IOException("Failed to find datanode, suggest to check cluster" + " health. excludeDatanodes=" + excludeDatanodes); } {code} Let me know if this helps ? > RBF: Improve RouterWebhdfsMethods#chooseDatanode() error handling > ----------------------------------------------------------------- > > Key: HDFS-14774 > URL: https://issues.apache.org/jira/browse/HDFS-14774 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: Wei-Chiu Chuang > Assignee: CR Hota > Priority: Minor > > HDFS-13972 added the following code: > {code} > try { > dns = rpcServer.getDatanodeReport(DatanodeReportType.LIVE); > } catch (IOException e) { > LOG.error("Cannot get the datanodes from the RPC server", e); > } finally { > // Reset ugi to remote user for remaining operations. > RouterRpcServer.resetCurrentUser(); > } > HashSet<Node> excludes = new HashSet<Node>(); > if (excludeDatanodes != null) { > Collection<String> collection = > getTrimmedStringCollection(excludeDatanodes); > for (DatanodeInfo dn : dns) { > if (collection.contains(dn.getName())) { > excludes.add(dn); > } > } > } > {code} > If {{rpcServer.getDatanodeReport()}} throws an exception, {{dns}} will become > null. This does't look like the best way to handle the exception. Should > router retry upon exception? Does it perform retry automatically under the > hood? > [~crh] [~brahmareddy] -- This message was sent by Atlassian Jira (v8.3.2#803003) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org