[
https://issues.apache.org/jira/browse/HDFS-11701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
James Moore updated HDFS-11701:
-------------------------------
Affects Version/s: (was: 2.7.0)
2.6.0
> NPE from Unresolved Host causes permanent DFSInputStream failures
> -----------------------------------------------------------------
>
> Key: HDFS-11701
> URL: https://issues.apache.org/jira/browse/HDFS-11701
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: hdfs-client
> Affects Versions: 2.6.0
> Environment: AWS Centos linux running HBase CDH 5.9.0 and HDFS CDH
> 5.9.0
> Reporter: James Moore
>
> We recently encountered the following NPE due to the DFSInputStream storing
> old cached block locations from hosts which could no longer resolve.
> {quote}
> Caused by: java.lang.NullPointerException
> at org.apache.hadoop.hdfs.DFSClient.isLocalAddress(DFSClient.java:1122)
> at
> org.apache.hadoop.hdfs.shortcircuit.DomainSocketFactory.getPathInfo(DomainSocketFactory.java:148)
> at
> org.apache.hadoop.hdfs.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:474)
> at
> org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:354)
> at
> org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:662)
> at
> org.apache.hadoop.hdfs.DFSInputStream.seekToNewSource(DFSInputStream.java:1613)
> at
> org.apache.hadoop.fs.FSDataInputStream.seekToNewSource(FSDataInputStream.java:127)
> ~HBase related stack frames trimmed~
> {quote}
> After investigating, the DFSInputStream appears to have been open for upwards
> of 3-4 weeks and had cached block locations from decommissioned nodes that no
> longer resolve in DNS and had been shutdown and removed from the cluster 2
> weeks prior. If the DFSInputStream had refreshed its block locations from
> the name node, it would have received alternative block locations which would
> not contain the decommissioned data nodes. As the above NPE leaves the
> non-resolving data node in the list of block locations the DFSInputStream
> never refreshes the block locations and all attempts to open a BlockReader
> for the given blocks will fail.
> In our case, we resolved the NPE by closing and re-opening every
> DFSInputStream in the cluster to force a purge of the block locations cache.
> Ideally, the DFSInputStream would re-fetch all block locations for a host
> which can't be resolved in DNS or at least the blocks requested.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]