[
https://issues.apache.org/jira/browse/HDFS-15250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112594#comment-17112594
]
Andrey Elenskiy commented on HDFS-15250:
----------------------------------------
We've run into the same issue on 3.1.3 and ended up getting
UnresolvedAddressException propagated all the way to clients (readers and
writers) even if only one block location was not able to be resolved. So the
entire read/write fails if on datanode from pipeline causes
UnresolvedAddressException.
I see the patch doesn't actually handle this exception but just logs in TRACE
and rethrows it, so would we expect to see the same problem?
I can also try out the patch on our system as it's fairly easy to reproduce in
case you think this change is enough.
> Setting `dfs.client.use.datanode.hostname` to true can crash the system
> because of unhandled UnresolvedAddressException
> -----------------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-15250
> URL: https://issues.apache.org/jira/browse/HDFS-15250
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Ctest
> Assignee: Ctest
> Priority: Major
> Fix For: 3.2.2, 3.3.1, 3.4.0, 3.1.5
>
> Attachments: HDFS-15250-001.patch, HDFS-15250-002.patch
>
>
> *Problem:*
> `dfs.client.use.datanode.hostname` by default is set to false, which means
> the client will use the IP address of the datanode to connect to the
> datanode, rather than the hostname of the datanode.
> In `org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.nextTcpPeer`:
>
> {code:java}
> try {
> Peer peer = remotePeerFactory.newConnectedPeer(inetSocketAddress, token,
> datanode);
> LOG.trace("nextTcpPeer: created newConnectedPeer {}", peer);
> return new BlockReaderPeer(peer, false);
> } catch (IOException e) {
> LOG.trace("nextTcpPeer: failed to create newConnectedPeer connected to"
> + "{}", datanode);
> throw e;
> }
> {code}
>
> If `dfs.client.use.datanode.hostname` is false, then it will try to connect
> via IP address. If the IP address is illegal and the connection fails,
> IOException will be thrown from `newConnectedPeer` and be handled.
> If `dfs.client.use.datanode.hostname` is true, then it will try to connect
> via hostname. If the hostname cannot be resolved, UnresolvedAddressException
> will be thrown from `newConnectedPeer`. However, UnresolvedAddressException
> is not a subclass of IOException so `nextTcpPeer` doesn’t handle this
> exception at all. This unhandled exception could crash the system.
>
> *Solution:*
> Since the method is handling the illegal IP address, then the illegal
> hostname should be also handled as well. One solution is to add the handling
> logic in `nextTcpPeer`:
> {code:java}
> } catch (IOException e) {
> LOG.trace("nextTcpPeer: failed to create newConnectedPeer connected to"
> + "{}", datanode);
> throw e;
> } catch (UnresolvedAddressException e) {
> ... // handling logic
> }{code}
> I am very happy to provide a patch to do this.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]