Hi Todd, I think Arpit's test method is incorrect. we cannot block port 8020 to simulate active NN down. because ZK session is live and NN process is running at the same time.
so when unblock 8020, NN1 think himself still is active. On Sat, Feb 8, 2014 at 3:47 AM, Todd Lipcon <t...@cloudera.com> wrote: > Hi Arpit, > > The issue here is that our transaction log is not a proper "write-ahead > log". In fact, it is a "write-behind" log of sorts -- our general > operations look something like: > > - lock namespace > - make a change to namespace > - write to log > - unlock namespace > - sync log > > In the case of an active which has been superseded by another one, it only > finds out there is a problem on the "sync" step above. But, it has already > applied the edits to its own namespace. Given that we have no facility to > rollback the change at this point, our only option is to abort, or else > risk having an inconsistent namespace upon a later failover back to this > node. > > Another option might be to completely clear and reload the namespace -- > essentially performing a within-process restart of the namenode. Given that > most people probably have some kind of cluster management software taking > care of automatically restarting crashed daemons, we figured it was simpler > to do a clean abort+reboot rather than implement the same thing within the > namenode -- thus avoiding any risk that we forget to "clear" any of our > state. > > Another option would be to make our logging use a proper "write-ahead" > mechanism instead of the write-behind we do now. Doing this while > maintaining good performance isn't super simple. > > There's some more background information on a JIRA filed a few years back > here: https://issues.apache.org/jira/browse/HDFS-1137 > > Hope that helps, > > -Todd > > > On Wed, Feb 5, 2014 at 2:28 PM, Arpit Gupta <ar...@hortonworks.com> wrote: > > > Hi > > > > I have a scenario where i am trying to test how HDFS HA works in case of > > network issues. I used iptables to block requests to the rpc port 8020 in > > order to simulate that. Below is the some info on what i did. > > > > > > NN1 - Active > > NN2 - Standby > > > > Using iptables stop port 8020 on NN1 ( > > > http://stackoverflow.com/questions/7423309/iptables-block-access-to-port-8000-except-from-ip-address > > ) > > iptables -A INPUT -p tcp --dport 8020 -j DROP > > > > NN2 transitions to active. > > > > Run the following command to allow requests to port 8020 ( > > > http://stackoverflow.com/questions/10197405/iptables-remove-specific-rules > > ) > > iptables -D INPUT -p tcp --dport 8020 -j DROP > > > > After this NN1 shut itself down with > > > > 2014-02-05 01:00:38,030 FATAL namenode.FSEditLog > > (JournalSet.java:mapJournalsAndReportErrors(354)) - Error: flush failed > for > > required journal (JournalAndStream(mgr=QJM to [IP:8485], > > stream=QuorumOutputStream starting at txid 568)) > > org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many > > exceptions to achieve quorum size 1/1. 1 exceptions thrown: > > 68.142.244.23:8485: IPC's epoch 1 is less than the last promised epoch 2 > > at > > > org.apache.hadoop.hdfs.qjournal.server.Journal.checkRequest(Journal.java:410) > > > > > > NN1 in this case shuts down with the above exception as it still believes > > its active hence there is an exception when talking to JN's. Thus the > > operators would have restart NN1 which could take a while based on the > > image size. Hence i was wondering if there is a better way to handle the > > above case where we may be transition to standby if exceptions like above > > are seen. > > > > > > Wanted to get thoughts of others before i opened a an enhancement > request. > > > > Thanks > > -- > > Arpit Gupta > > Hortonworks Inc. > > http://hortonworks.com/ > > > > > > -- > > CONFIDENTIALITY NOTICE > > NOTICE: This message is intended for the use of the individual or entity > to > > which it is addressed and may contain information that is confidential, > > privileged and exempt from disclosure under applicable law. If the reader > > of this message is not the intended recipient, you are hereby notified > that > > any printing, copying, dissemination, distribution, disclosure or > > forwarding of this communication is strictly prohibited. If you have > > received this communication in error, please contact the sender > immediately > > and delete it from your system. Thank You. > > > > > > -- > Todd Lipcon > Software Engineer, Cloudera >