[
https://issues.apache.org/jira/browse/HDFS-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Uma Maheswara Rao G updated HDFS-3452:
--------------------------------------
Attachment: HDFS-3452-1.patch
> BKJM:Switch from standby to active fails and NN gets shut down due to delay
> in clearing of lock
> -----------------------------------------------------------------------------------------------
>
> Key: HDFS-3452
> URL: https://issues.apache.org/jira/browse/HDFS-3452
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: suja s
> Assignee: Uma Maheswara Rao G
> Priority: Blocker
> Attachments: BK-253-BKJM.patch, HDFS-3452-1.patch, HDFS-3452.patch,
> HDFS-3452.patch
>
>
> Normal switch fails.
> (BKjournalManager zk session timeout is 3000 and ZKFC session timeout is
> 5000. By the time control comes to acquire lock the previous lock is not
> released which leads to failure in lock acquisition by NN and NN gets
> shutdown. Ideally it should have been done)
> =============================================================================
> 2012-05-09 20:15:29,732 ERROR org.apache.hadoop.contrib.bkjournal.WriteLock:
> Failed to acquire lock with /ledgers/lock/lock-0000000007, lock-0000000006
> already has it
> 2012-05-09 20:15:29,732 FATAL
> org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error:
> recoverUnfinalizedSegments failed for required journal
> (JournalAndStream(mgr=org.apache.hadoop.contrib.bkjournal.BookKeeperJournalManager@412beeec,
> stream=null))
> java.io.IOException: Could not acquire lock
> at org.apache.hadoop.contrib.bkjournal.WriteLock.acquire(WriteLock.java:107)
> at
> org.apache.hadoop.contrib.bkjournal.BookKeeperJournalManager.recoverUnfinalizedSegments(BookKeeperJournalManager.java:406)
> at
> org.apache.hadoop.hdfs.server.namenode.JournalSet$6.apply(JournalSet.java:551)
> at
> org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:322)
> at
> org.apache.hadoop.hdfs.server.namenode.JournalSet.recoverUnfinalizedSegments(JournalSet.java:548)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.recoverUnclosedStreams(FSEditLog.java:1134)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:598)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1287)
> at
> org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61)
> at
> org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:63)
> at
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1219)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:978)
> at
> org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107)
> at
> org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:3633)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:427)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:916)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1692)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1688)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1686)
> 2012-05-09 20:15:29,736 INFO org.apache.hadoop.hdfs.server.namenode.NameNode:
> SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down NameNode at HOST-XX-XX-XX-XX/XX.XX.XX.XX
> Scenario:
> Start ZKFCS, NNs
> NN1 is active and NN2 is standby
> Stop NN1. NN2 tries to transition to active and gets shut down
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira