[ 
https://issues.apache.org/jira/browse/HDFS-14961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16978307#comment-16978307
 ] 

Ayush Saxena commented on HDFS-14961:
-------------------------------------

Anyway, Thanx For the agreement.
bq. I just mean that the logic for zkfc also has a problem.
Agreed, I didn't check why this logic is not working here. This is the problem, 
with threads running in parallel, They become too much environment specific, 
For me the state is coming standby from start, As there in the LOG, I shared 
too. I tried in a different environment, it worked as you said.

I don't deny the fact that logically, we can have the check that you said to 
prevent ZK effort too. That is logically correct, but for this case I think we 
should be at the last site where the action takes place i.e the Namenode, and 
eliminate any chances of Race Conditions.

IMO  We can do that ZK check separately, post we solve this. :)

> Prevent ZKFC changing Observer Namenode state
> ---------------------------------------------
>
>                 Key: HDFS-14961
>                 URL: https://issues.apache.org/jira/browse/HDFS-14961
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Íñigo Goiri
>            Assignee: Ayush Saxena
>            Priority: Major
>         Attachments: HDFS-14961-01.patch, HDFS-14961-02.patch, 
> ZKFC-TEST-14961.patch
>
>
> HDFS-14130 made ZKFC aware of the Observer Namenode and hence allows ZKFC 
> running along with the observer NOde.
> The Observer namenode isn't suppose to be part of ZKFC election process.
> But if the  Namenode was part of election, before turning into Observer by 
> transitionToObserver Command. The ZKFC still sends instruction to the 
> Namenode as a result of previous participation and sometimes tend to change 
> the state of Observer to Standby.
> This is also the reason for  failure in TestDFSZKFailoverController.
> TestDFSZKFailoverController has been consistently failing with a time out 
> waiting in testManualFailoverWithDFSHAAdmin(). In particular 
> {{waitForHAState(1, HAServiceState.OBSERVER);}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to