[ 
https://issues.apache.org/jira/browse/HDFS-7739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522710#comment-14522710
 ] 

Chris Nauroth commented on HDFS-7739:
-------------------------------------

Hi [~brahmareddy].  From the stack trace, it looks like the process is blocked 
waiting to read output from the ssh connection to run fuser to stop the old 
active.  I can think of 2 possible theories:

# Passwordless ssh is not configured, so the connection is hanging indefinitely 
prompting for a password.  This would require configuration of 
{{dfs.ha.fencing.ssh.private-key-files}} to specify the ssh key file.
# The ssh connection to run fuser is hanging indefinitely.  This could be 
caused by a lot of different kinds of failures at the old active, making it 
unresponsive.  This can be mitigated by configuring a timeout on the ssh 
connection ({{dfs.ha.fencing.ssh.connect-timeout}}).

This documentation page has more details:

http://hadoop.apache.org/docs/r2.7.0/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html#Configuration_details

> ZKFC - transitionToActive is indefinitely waiting to complete fenceOldActive
> ----------------------------------------------------------------------------
>
>                 Key: HDFS-7739
>                 URL: https://issues.apache.org/jira/browse/HDFS-7739
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: auto-failover
>    Affects Versions: 2.6.0
>            Reporter: Brahma Reddy Battula
>            Assignee: Brahma Reddy Battula
>            Priority: Critical
>         Attachments: zkfctd.out
>
>
>  *Scenario:* 
> One of the cluster disk got full and ZKFC making tranisionToAcitve ,To fence 
> old active node it needs to execute the command and wait for tge result, 
> since disk got full, strempumper thread will be indefinitely waiting( Even 
> after free the disk also, it will not come out)...
>  *{color:blue}Please check the attached thread dump of ZKFC{color}* ..
>  *{color:green}Better to maintain the timeout for stream-pumper 
> thread{color}* .
> {code}
> protected void pump() throws IOException {
>     InputStreamReader inputStreamReader = new InputStreamReader(stream);
>     BufferedReader br = new BufferedReader(inputStreamReader);
>     String line = null;
>     while ((line = br.readLine()) != null) {
>       if (type == StreamType.STDOUT) {
>         log.info(logPrefix + ": " + line);
>       } else {
>         log.warn(logPrefix + ": " + line);          
>       }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to