[ 
https://issues.apache.org/jira/browse/HDFS-3618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435937#comment-13435937
 ] 

Vinay commented on HDFS-3618:
-----------------------------

* Regarding StreamPumper thread getting hanged:
I tried to reproduce issue by creating a small pause using debug point before 
starting the thread for reading streams inside 
{{SshFenceByTcpPort.execCommand(Session, String)}}, before executing 
{{errPumper.start();}}.
By the time I release the debug point, {{exec}} was closed. And {{errPumper}} 
got hanged.
*# We can start the threads for reading streams only if {{exec}} is not closed.
*# We can wait till {{exec}} is closed before getting exitStatus.

* Regarding nc command not found case I think we can handle as following
*# If the command present and process running with specified port, then nc exit 
code will be 0
*# Command present and no process is running with that port, then nc exit code 
will be 1.
*# If command itself will not present then exit code will be 127.
*# If command dont have permissions then exit code will be 126 ( as Uma 
mentioned)
Here we need to treat only return code 1 as success, others should be treated 
as failed.
One configuration we can introduce to specify the alternative {{nc}} command 
(netcat) in case its not present in the machine.


                
> SSH fencing option may incorrectly succeed if nc (netcat) command not present
> -----------------------------------------------------------------------------
>
>                 Key: HDFS-3618
>                 URL: https://issues.apache.org/jira/browse/HDFS-3618
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: auto-failover
>    Affects Versions: 2.1.0-alpha
>            Reporter: Brahma Reddy Battula
>         Attachments: zkfc_threaddump.out, zkfc.txt
>
>
> Started NN's and zkfc's in Suse11.
> Suse11 will have netcat installation and netcat -z will work(but nc -z wn't 
> work)..
> While executing following command, got command not found hence rc will be 
> other than zero and assuming that server was down..Here we are ending up 
> without checking whether service is down or not..
> {code}
> LOG.info(
>             "Indeterminate response from trying to kill service. " +
>             "Verifying whether it is running using nc...");
>         rc = execCommand(session, "nc -z " + serviceAddr.getHostName() +
>             " " + serviceAddr.getPort());
>         if (rc == 0) {
>           // the service is still listening - we are unable to fence
>           LOG.warn("Unable to fence - it is running but we cannot kill it");
>           return false;
>         } else {
>           LOG.info("Verified that the service is down.");
>           return true;          
>         }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to