[ 
https://issues.apache.org/jira/browse/HADOOP-15684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16586322#comment-16586322
 ] 

Lukas Majercak commented on HADOOP-15684:
-----------------------------------------

Thanks for the patch [~trjianjianjiao]. Instead of adding another catch 
statement for ConnectTimeoutException, could we just change RemoteException to 
Exception? And get rid of the check for whether the exception is Standby. We 
should just move to the next NN no matter what the exception is.

> triggerActiveLogRoll stuck on dead name node, when ConnectTimeoutException 
> happens. 
> ------------------------------------------------------------------------------------
>
>                 Key: HADOOP-15684
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15684
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ha
>            Reporter: Rong Tang
>            Priority: Critical
>         Attachments: FixRollEditLog.patch, 
> hadoop--rollingUpgrade-BN2SCH070021402.log
>
>
> When name node call triggerActiveLogRoll, and the cachedActiveProxy is a dead 
> name node, it will throws a ConnectTimeoutException, expected behavior is to 
> try next NN, but current logic doesn't do so, instead, it keeps trying the 
> dead, mistakenly take it as active.
>  
> 2018-08-17 10:02:12,001 WARN [Edit log tailer] 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Unable to trigger a 
> roll of the active NN
> org.apache.hadoop.net.ConnectTimeoutException: Call From 
> BN2SCH070021402/25.126.188.193 to BN2SCH070041016.ap.gbl:8020 failed on 
> socket timeout exception: org.apache.hadoop.net.ConnectTimeoutException: 
> 20000 millis timeout 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$2.doWork(EditLogTailer.java:298)
>  
> C:\Users\rotang>ping BN2SCH070041016
> Pinging BN2SCH070041016 [25.126.141.79] with 32 bytes of data:
> Request timed out.
> Request timed out.
> Request timed out.
> Request timed out.
>  
> Attachment is a log file saying how it repeatedly retries a dead name node, 
> and a fix patch.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to