[
https://issues.apache.org/jira/browse/AMBARI-10536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hari Sekhon updated AMBARI-10536:
---------------------------------
Description:
After a failed stack upgrade of HDP 2.2.0 => 2.2.4 (AMBARI-10519) and
subsequent rollback, Ambari 2.0 leaves one of the HDFS HA NameNodes in an
inconsistent state:
{code}2015-04-16 11:45:38,231 INFO namenode.FSImage
(FSEditLogLoader.java:loadFSEdits(138)) - Start loading edits file
http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd,
http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd
2015-04-16 11:45:38,232 INFO namenode.EditLogInputStream
(RedundantEditLogInputStream.java:nextOp(176)) - Fast-forwarding stream
'http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd,
http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd'
to transaction ID 54367965
2015-04-16 11:45:38,232 INFO namenode.EditLogInputStream
(RedundantEditLogInputStream.java:nextOp(176)) - Fast-forwarding stream
'http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd'
to transaction ID 54367965
2015-04-16 11:45:38,284 ERROR namenode.FSEditLogLoader
(FSEditLogLoader.java:loadEditRecords(238)) - Encountered exception on
operation RollingUpgradeOp [START, time=1429181084342]
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory
/data1/nn is in an inconsistent state: previous fs state should not exist
during upgrade. Finalize or rollback first.
at
org.apache.hadoop.hdfs.server.namenode.FSImage.checkUpgrade(FSImage.java:348)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startRollingUpgradeInternal(FSNamesystem.java:8322)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:750)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:230)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:139)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:824)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:805)
at
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:230)
at
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:324)
at
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:282)
at
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:299)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:356)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1608)
at
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:410)
at
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:295)
2015-04-16 11:45:39,111 FATAL ha.EditLogTailer (EditLogTailer.java:doWork(331))
- Unknown error encountered while tailing edits. Shutting down standby NN.
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory
/data1/nn is in an inconsistent state: previous fs state should not exist
during upgrade. Finalize or rollback first.
at
org.apache.hadoop.hdfs.server.namenode.FSImage.checkUpgrade(FSImage.java:348)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startRollingUpgradeInternal(FSNamesystem.java:8322)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:750)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:230)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:139)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:824)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:805)
at
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:230)
at
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:324)
at
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:282)
at
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:299)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:356)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1608)
at
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:410)
at
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:295)
2015-04-16 11:45:39,114 INFO util.ExitUtil (ExitUtil.java:terminate(124)) -
Exiting with status 1
2015-04-16 11:45:39,115 INFO namenode.NameNode (StringUtils.java:run(659)) -
SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at <custom_scrubbed>/<custom_scrubbed>
************************************************************/{code}
The NameNode was shut down as a result, and after restarting it, it still
doesn't work properly as doing ha admin failover commands return similar
exceptions complaining about this inconsistent state, which should be visible
in the NameNode logs I've uploaded.
Hari Sekhon
http://www.linkedin.com/in/harisekhon
was:
After a failed stack upgrade of HDP 2.2.0 => 2.2.4 (AMBARI-10519) and
subsequent rollback, Ambari 2.0 leaves one of the HDFS HA NameNodes in an
inconsistent state:
{code}2015-04-16 11:45:38,231 INFO namenode.FSImage
(FSEditLogLoader.java:loadFSEdits(138)) - Start loading edits file
http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd,
http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd
2015-04-16 11:45:38,232 INFO namenode.EditLogInputStream
(RedundantEditLogInputStream.java:nextOp(176)) - Fast-forwarding stream
'http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd,
http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd'
to transaction ID 54367965
2015-04-16 11:45:38,232 INFO namenode.EditLogInputStream
(RedundantEditLogInputStream.java:nextOp(176)) - Fast-forwarding stream
'http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd'
to transaction ID 54367965
2015-04-16 11:45:38,284 ERROR namenode.FSEditLogLoader
(FSEditLogLoader.java:loadEditRecords(238)) - Encountered exception on
operation RollingUpgradeOp [START, time=1429181084342]
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory
/data1/nn is in an inconsistent state: previous fs state should not exist
during upgrade. Finalize or rollback first.
at
org.apache.hadoop.hdfs.server.namenode.FSImage.checkUpgrade(FSImage.java:348)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startRollingUpgradeInternal(FSNamesystem.java:8322)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:750)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:230)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:139)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:824)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:805)
at
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:230)
at
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:324)
at
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:282)
at
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:299)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:356)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1608)
at
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:410)
at
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:295)
2015-04-16 11:45:39,111 FATAL ha.EditLogTailer (EditLogTailer.java:doWork(331))
- Unknown error encountered while tailing edits. Shutting down standby NN.
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory
/data1/nn is in an inconsistent state: previous fs state should not exist
during upgrade. Finalize or rollback first.
at
org.apache.hadoop.hdfs.server.namenode.FSImage.checkUpgrade(FSImage.java:348)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startRollingUpgradeInternal(FSNamesystem.java:8322)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:750)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:230)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:139)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:824)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:805)
at
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:230)
at
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:324)
at
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:282)
at
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:299)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:356)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1608)
at
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:410)
at
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:295)
2015-04-16 11:45:39,114 INFO util.ExitUtil (ExitUtil.java:terminate(124)) -
Exiting with status 1
2015-04-16 11:45:39,115 INFO namenode.NameNode (StringUtils.java:run(659)) -
SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at <custom_scrubbed>/<custom_scrubbed>
************************************************************/{code}
The NameNode was shut down as a result, and after restarting it, it still
doesn't work properly as doing ha admin failover commands return similar
exceptions complaining about this inconsistent state.
Hari Sekhon
http://www.linkedin.com/in/harisekhon
> Ambari 2.0 HDP 2.2.4 => 2.2.0 stack rollback leaves one NameNode in
> inconsistent state, breaking HA and failover
> ----------------------------------------------------------------------------------------------------------------
>
> Key: AMBARI-10536
> URL: https://issues.apache.org/jira/browse/AMBARI-10536
> Project: Ambari
> Issue Type: Bug
> Components: ambari-server, stacks
> Affects Versions: 2.0.0
> Environment: HDP 2.2.0.0 <= rollback <= 2.2.4.0
> Reporter: Hari Sekhon
> Priority: Critical
> Attachments: broken-namenode-nn1.log.bz2,
> remaining-namenode-nn2.log.bz2
>
>
> After a failed stack upgrade of HDP 2.2.0 => 2.2.4 (AMBARI-10519) and
> subsequent rollback, Ambari 2.0 leaves one of the HDFS HA NameNodes in an
> inconsistent state:
> {code}2015-04-16 11:45:38,231 INFO namenode.FSImage
> (FSEditLogLoader.java:loadFSEdits(138)) - Start loading edits file
> http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd,
>
> http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd
> 2015-04-16 11:45:38,232 INFO namenode.EditLogInputStream
> (RedundantEditLogInputStream.java:nextOp(176)) - Fast-forwarding stream
> 'http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd,
>
> http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd'
> to transaction ID 54367965
> 2015-04-16 11:45:38,232 INFO namenode.EditLogInputStream
> (RedundantEditLogInputStream.java:nextOp(176)) - Fast-forwarding stream
> 'http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd'
> to transaction ID 54367965
> 2015-04-16 11:45:38,284 ERROR namenode.FSEditLogLoader
> (FSEditLogLoader.java:loadEditRecords(238)) - Encountered exception on
> operation RollingUpgradeOp [START, time=1429181084342]
> org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory
> /data1/nn is in an inconsistent state: previous fs state should not exist
> during upgrade. Finalize or rollback first.
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.checkUpgrade(FSImage.java:348)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startRollingUpgradeInternal(FSNamesystem.java:8322)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:750)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:230)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:139)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:824)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:805)
> at
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:230)
> at
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:324)
> at
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:282)
> at
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:299)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:356)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1608)
> at
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:410)
> at
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:295)
> 2015-04-16 11:45:39,111 FATAL ha.EditLogTailer
> (EditLogTailer.java:doWork(331)) - Unknown error encountered while tailing
> edits. Shutting down standby NN.
> org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory
> /data1/nn is in an inconsistent state: previous fs state should not exist
> during upgrade. Finalize or rollback first.
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.checkUpgrade(FSImage.java:348)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startRollingUpgradeInternal(FSNamesystem.java:8322)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:750)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:230)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:139)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:824)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:805)
> at
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:230)
> at
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:324)
> at
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:282)
> at
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:299)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:356)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1608)
> at
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:410)
> at
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:295)
> 2015-04-16 11:45:39,114 INFO util.ExitUtil (ExitUtil.java:terminate(124)) -
> Exiting with status 1
> 2015-04-16 11:45:39,115 INFO namenode.NameNode (StringUtils.java:run(659)) -
> SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down NameNode at <custom_scrubbed>/<custom_scrubbed>
> ************************************************************/{code}
> The NameNode was shut down as a result, and after restarting it, it still
> doesn't work properly as doing ha admin failover commands return similar
> exceptions complaining about this inconsistent state, which should be visible
> in the NameNode logs I've uploaded.
> Hari Sekhon
> http://www.linkedin.com/in/harisekhon
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)