[ 
https://issues.apache.org/jira/browse/HDFS-11701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lokesh Jain updated HDFS-11701:
-------------------------------
    Attachment: HDFS-11701.004.patch

> NPE from Unresolved Host causes permanent DFSInputStream failures
> -----------------------------------------------------------------
>
>                 Key: HDFS-11701
>                 URL: https://issues.apache.org/jira/browse/HDFS-11701
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs-client
>    Affects Versions: 2.6.0
>         Environment: AWS Centos linux running HBase CDH 5.9.0 and HDFS CDH 
> 5.9.0
>            Reporter: James Moore
>            Assignee: Lokesh Jain
>            Priority: Major
>         Attachments: HDFS-11701.001.patch, HDFS-11701.002.patch, 
> HDFS-11701.003.patch, HDFS-11701.004.patch
>
>
> We recently encountered the following NPE due to the DFSInputStream storing 
> old cached block locations from hosts which could no longer resolve.
> {quote}
> Caused by: java.lang.NullPointerException
>     at org.apache.hadoop.hdfs.DFSClient.isLocalAddress(DFSClient.java:1122)
>     at 
> org.apache.hadoop.hdfs.shortcircuit.DomainSocketFactory.getPathInfo(DomainSocketFactory.java:148)
>     at 
> org.apache.hadoop.hdfs.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:474)
>     at 
> org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:354)
>     at 
> org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:662)
>     at 
> org.apache.hadoop.hdfs.DFSInputStream.seekToNewSource(DFSInputStream.java:1613)
>     at 
> org.apache.hadoop.fs.FSDataInputStream.seekToNewSource(FSDataInputStream.java:127)
> ~HBase related stack frames trimmed~
> {quote}
> After investigating, the DFSInputStream appears to have been open for upwards 
> of 3-4 weeks and had cached block locations from decommissioned nodes that no 
> longer resolve in DNS and had been shutdown and removed from the cluster 2 
> weeks prior.  If the DFSInputStream had refreshed its block locations from 
> the name node, it would have received alternative block locations which would 
> not contain the decommissioned data nodes.  As the above NPE leaves the 
> non-resolving data node in the list of block locations the DFSInputStream 
> never refreshes the block locations and all attempts to open a BlockReader 
> for the given blocks will fail.
> In our case, we resolved the NPE by closing and re-opening every 
> DFSInputStream in the cluster to force a purge of the block locations cache.  
> Ideally, the DFSInputStream would re-fetch all block locations for a host 
> which can't be resolved in DNS or at least the blocks requested.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to