[ 
https://issues.apache.org/jira/browse/AMBARI-10536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated AMBARI-10536:
---------------------------------
    Description: 
After a failed stack upgrade of HDP 2.2.0 => 2.2.4 (AMBARI-10519) and 
subsequent rollback, Ambari 2.0 leaves one of the HDFS HA NameNodes in an 
inconsistent state:
{code}2015-04-16 11:45:38,231 INFO  namenode.FSImage 
(FSEditLogLoader.java:loadFSEdits(138)) - Start loading edits file 
http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd,
 
http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd
2015-04-16 11:45:38,232 INFO  namenode.EditLogInputStream 
(RedundantEditLogInputStream.java:nextOp(176)) - Fast-forwarding stream 
'http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd,
 
http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd'
 to transaction ID 54367965
2015-04-16 11:45:38,232 INFO  namenode.EditLogInputStream 
(RedundantEditLogInputStream.java:nextOp(176)) - Fast-forwarding stream 
'http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd'
 to transaction ID 54367965
2015-04-16 11:45:38,284 ERROR namenode.FSEditLogLoader 
(FSEditLogLoader.java:loadEditRecords(238)) - Encountered exception on 
operation RollingUpgradeOp [START, time=1429181084342]
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory 
/data1/nn is in an inconsistent state: previous fs state should not exist 
during upgrade. Finalize or rollback first.
        at 
org.apache.hadoop.hdfs.server.namenode.FSImage.checkUpgrade(FSImage.java:348)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startRollingUpgradeInternal(FSNamesystem.java:8322)
        at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:750)
        at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:230)
        at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:139)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:824)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:805)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:230)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:324)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:282)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:299)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:356)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1608)
        at 
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:410)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:295)
2015-04-16 11:45:39,111 FATAL ha.EditLogTailer (EditLogTailer.java:doWork(331)) 
- Unknown error encountered while tailing edits. Shutting down standby NN.
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory 
/data1/nn is in an inconsistent state: previous fs state should not exist 
during upgrade. Finalize or rollback first.
        at 
org.apache.hadoop.hdfs.server.namenode.FSImage.checkUpgrade(FSImage.java:348)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startRollingUpgradeInternal(FSNamesystem.java:8322)
        at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:750)
        at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:230)
        at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:139)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:824)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:805)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:230)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:324)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:282)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:299)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:356)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1608)
        at 
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:410)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:295)
2015-04-16 11:45:39,114 INFO  util.ExitUtil (ExitUtil.java:terminate(124)) - 
Exiting with status 1
2015-04-16 11:45:39,115 INFO  namenode.NameNode (StringUtils.java:run(659)) - 
SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at <custom_scrubbed>/<custom_scrubbed>
************************************************************/{code}

The NameNode was shut down as a result, and after restarting it, it still 
doesn't work properly as doing ha admin failover commands return similar 
exceptions complaining about this inconsistent state, which should be visible 
in the NameNode logs I've uploaded.

Hari Sekhon
http://www.linkedin.com/in/harisekhon

  was:
After a failed stack upgrade of HDP 2.2.0 => 2.2.4 (AMBARI-10519) and 
subsequent rollback, Ambari 2.0 leaves one of the HDFS HA NameNodes in an 
inconsistent state:
{code}2015-04-16 11:45:38,231 INFO  namenode.FSImage 
(FSEditLogLoader.java:loadFSEdits(138)) - Start loading edits file 
http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd,
 
http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd
2015-04-16 11:45:38,232 INFO  namenode.EditLogInputStream 
(RedundantEditLogInputStream.java:nextOp(176)) - Fast-forwarding stream 
'http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd,
 
http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd'
 to transaction ID 54367965
2015-04-16 11:45:38,232 INFO  namenode.EditLogInputStream 
(RedundantEditLogInputStream.java:nextOp(176)) - Fast-forwarding stream 
'http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd'
 to transaction ID 54367965
2015-04-16 11:45:38,284 ERROR namenode.FSEditLogLoader 
(FSEditLogLoader.java:loadEditRecords(238)) - Encountered exception on 
operation RollingUpgradeOp [START, time=1429181084342]
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory 
/data1/nn is in an inconsistent state: previous fs state should not exist 
during upgrade. Finalize or rollback first.
        at 
org.apache.hadoop.hdfs.server.namenode.FSImage.checkUpgrade(FSImage.java:348)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startRollingUpgradeInternal(FSNamesystem.java:8322)
        at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:750)
        at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:230)
        at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:139)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:824)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:805)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:230)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:324)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:282)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:299)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:356)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1608)
        at 
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:410)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:295)
2015-04-16 11:45:39,111 FATAL ha.EditLogTailer (EditLogTailer.java:doWork(331)) 
- Unknown error encountered while tailing edits. Shutting down standby NN.
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory 
/data1/nn is in an inconsistent state: previous fs state should not exist 
during upgrade. Finalize or rollback first.
        at 
org.apache.hadoop.hdfs.server.namenode.FSImage.checkUpgrade(FSImage.java:348)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startRollingUpgradeInternal(FSNamesystem.java:8322)
        at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:750)
        at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:230)
        at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:139)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:824)
        at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:805)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:230)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:324)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:282)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:299)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:356)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1608)
        at 
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:410)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:295)
2015-04-16 11:45:39,114 INFO  util.ExitUtil (ExitUtil.java:terminate(124)) - 
Exiting with status 1
2015-04-16 11:45:39,115 INFO  namenode.NameNode (StringUtils.java:run(659)) - 
SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at <custom_scrubbed>/<custom_scrubbed>
************************************************************/{code}

The NameNode was shut down as a result, and after restarting it, it still 
doesn't work properly as doing ha admin failover commands return similar 
exceptions complaining about this inconsistent state.

Hari Sekhon
http://www.linkedin.com/in/harisekhon


> Ambari 2.0 HDP 2.2.4 => 2.2.0 stack rollback leaves one NameNode in 
> inconsistent state, breaking HA and failover
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: AMBARI-10536
>                 URL: https://issues.apache.org/jira/browse/AMBARI-10536
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-server, stacks
>    Affects Versions: 2.0.0
>         Environment: HDP 2.2.0.0 <= rollback <= 2.2.4.0
>            Reporter: Hari Sekhon
>            Priority: Critical
>         Attachments: broken-namenode-nn1.log.bz2, 
> remaining-namenode-nn2.log.bz2
>
>
> After a failed stack upgrade of HDP 2.2.0 => 2.2.4 (AMBARI-10519) and 
> subsequent rollback, Ambari 2.0 leaves one of the HDFS HA NameNodes in an 
> inconsistent state:
> {code}2015-04-16 11:45:38,231 INFO  namenode.FSImage 
> (FSEditLogLoader.java:loadFSEdits(138)) - Start loading edits file 
> http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd,
>  
> http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd
> 2015-04-16 11:45:38,232 INFO  namenode.EditLogInputStream 
> (RedundantEditLogInputStream.java:nextOp(176)) - Fast-forwarding stream 
> 'http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd,
>  
> http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd'
>  to transaction ID 54367965
> 2015-04-16 11:45:38,232 INFO  namenode.EditLogInputStream 
> (RedundantEditLogInputStream.java:nextOp(176)) - Fast-forwarding stream 
> 'http://<custom_scrubbed>:8480/getJournal?jid=nameservice1&segmentTxId=54367965&storageInfo=-60%3A1459025177%3A1418910715375%3ACID-8055996a-b5ce-4b07-9b32-f2dbe9123edd'
>  to transaction ID 54367965
> 2015-04-16 11:45:38,284 ERROR namenode.FSEditLogLoader 
> (FSEditLogLoader.java:loadEditRecords(238)) - Encountered exception on 
> operation RollingUpgradeOp [START, time=1429181084342]
> org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory 
> /data1/nn is in an inconsistent state: previous fs state should not exist 
> during upgrade. Finalize or rollback first.
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.checkUpgrade(FSImage.java:348)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startRollingUpgradeInternal(FSNamesystem.java:8322)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:750)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:230)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:139)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:824)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:805)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:230)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:324)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:282)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:299)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:356)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1608)
>         at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:410)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:295)
> 2015-04-16 11:45:39,111 FATAL ha.EditLogTailer 
> (EditLogTailer.java:doWork(331)) - Unknown error encountered while tailing 
> edits. Shutting down standby NN.
> org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory 
> /data1/nn is in an inconsistent state: previous fs state should not exist 
> during upgrade. Finalize or rollback first.
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.checkUpgrade(FSImage.java:348)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startRollingUpgradeInternal(FSNamesystem.java:8322)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:750)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:230)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:139)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:824)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:805)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:230)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:324)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:282)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:299)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:356)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1608)
>         at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:410)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:295)
> 2015-04-16 11:45:39,114 INFO  util.ExitUtil (ExitUtil.java:terminate(124)) - 
> Exiting with status 1
> 2015-04-16 11:45:39,115 INFO  namenode.NameNode (StringUtils.java:run(659)) - 
> SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down NameNode at <custom_scrubbed>/<custom_scrubbed>
> ************************************************************/{code}
> The NameNode was shut down as a result, and after restarting it, it still 
> doesn't work properly as doing ha admin failover commands return similar 
> exceptions complaining about this inconsistent state, which should be visible 
> in the NameNode logs I've uploaded.
> Hari Sekhon
> http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to