[ 
https://issues.apache.org/jira/browse/HDFS-5850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kihwal Lee updated HDFS-5850:
-----------------------------

    Description: 
[~knoguchi] recently noticed that the trash directories of a restarted cluster 
are not cleaned up. It turned out that it was caused by a transient DNS problem 
during initialization.

TrashEmptier thread in namenode is actually a FileSystem client running in a 
loop, which makes RPC calls to itself in order  to list, rename and delete 
trash files.  In a secure setup, the client needs to create the right service 
principal name for the namenode for making a RPC connection. If there is a DNS 
issue at that moment, the SPN ends up with the IP address, not the fqdn.

Since KDC does not recognize this SPN, TrashEmptier does not work from that 
point on. I verified that the SPN with the IP address was what the TrashEmptier 
thread asked KDC for a service ticket for.

  was:
[~knoguchi] once noticed that the trash directories of a restarted cluster are 
not cleaned up. It turned out that it was caused by a transient DNS problem 
during initialization.

TrashEmptier thread in namenode is actually a FileSystem client running in a 
loop, which makes RPC calls to itself in order  to list, rename and delete 
trash files.  In a secure setup, the client needs to create the right service 
principal name for the namenode for making a RPC connection. If there is a DNS 
issue at that moment, the SPN ends up with the IP address, not the fqdn.

Since KDC does not recognize this SPN, TrashEmptier does not work from that 
point on. I verified that the SPN with the IP address was what the TrashEmptier 
thread asked KDC for a service ticket for.


> DNS Issues during TrashEmptier initialization can silently leave it 
> non-functional
> ----------------------------------------------------------------------------------
>
>                 Key: HDFS-5850
>                 URL: https://issues.apache.org/jira/browse/HDFS-5850
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.4.0
>            Reporter: Kihwal Lee
>            Priority: Critical
>
> [~knoguchi] recently noticed that the trash directories of a restarted 
> cluster are not cleaned up. It turned out that it was caused by a transient 
> DNS problem during initialization.
> TrashEmptier thread in namenode is actually a FileSystem client running in a 
> loop, which makes RPC calls to itself in order  to list, rename and delete 
> trash files.  In a secure setup, the client needs to create the right service 
> principal name for the namenode for making a RPC connection. If there is a 
> DNS issue at that moment, the SPN ends up with the IP address, not the fqdn.
> Since KDC does not recognize this SPN, TrashEmptier does not work from that 
> point on. I verified that the SPN with the IP address was what the 
> TrashEmptier thread asked KDC for a service ticket for.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to