[
https://issues.apache.org/jira/browse/AMBARI-12230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Onischuk updated AMBARI-12230:
-------------------------------------
Description:
2015-06-17 23:00:32,926 WARN ha.EditLogTailer
(EditLogTailer.java:doWork(339)) - Edit log tailer interrupted
java.lang.InterruptedException: sleep interrupted
at java.lang.Thread.sleep(Native Method)
at
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:337)
at
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:282)
at
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:299)
at
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:412)
at
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:295)
2015-06-17 23:00:32,930 INFO namenode.FSNamesystem
(FSNamesystem.java:startActiveServices(1152)) - Starting services required for
active state
2015-06-17 23:00:32,946 INFO client.QuorumJournalManager
(QuorumJournalManager.java:recoverUnfinalizedSegments(435)) - Starting recovery
process for unclosed journal segments...
2015-06-17 23:00:32,963 FATAL namenode.FSEditLog
(JournalSet.java:mapJournalsAndReportErrors(398)) - Error:
recoverUnfinalizedSegments failed for required journal
(JournalAndStream(mgr=QJM to [10.222.32.220:8485, 10.222.32.214:8485,
10.222.32.216:8485], stream=null))
org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many
exceptions to achieve quorum size 2/3. 3 exceptions thrown:
10.222.32.220:8485: Journal Storage Directory
/hadoop/hdfs/journalnode/preprod not formatted
was:
PROBLEM: The customer was following the Ambari 2.0.1instructions for upgrading
the stack from HDP 2.1 to 2.2.6 found here:
<http://docs.hortonworks.com/HDPDocuments/Ambari-2.0.1.0/bk_upgrading_Ambari/c
ontent/_upgrading_the_hdp_stack_from_21_to_22.html>
When they tried to start the NN in section 3 (Complete the Upgrade), step 12
of those instructions it failed with the error
2015-06-17 23:00:32,926 WARN ha.EditLogTailer
(EditLogTailer.java:doWork(339)) - Edit log tailer interrupted
java.lang.InterruptedException: sleep interrupted
at java.lang.Thread.sleep(Native Method)
at
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:337)
at
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:282)
at
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:299)
at
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:412)
at
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:295)
2015-06-17 23:00:32,930 INFO namenode.FSNamesystem
(FSNamesystem.java:startActiveServices(1152)) - Starting services required for
active state
2015-06-17 23:00:32,946 INFO client.QuorumJournalManager
(QuorumJournalManager.java:recoverUnfinalizedSegments(435)) - Starting recovery
process for unclosed journal segments...
2015-06-17 23:00:32,963 FATAL namenode.FSEditLog
(JournalSet.java:mapJournalsAndReportErrors(398)) - Error:
recoverUnfinalizedSegments failed for required journal
(JournalAndStream(mgr=QJM to [10.222.32.220:8485, 10.222.32.214:8485,
10.222.32.216:8485], stream=null))
org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many
exceptions to achieve quorum size 2/3. 3 exceptions thrown:
10.222.32.220:8485: Journal Storage Directory
/hadoop/hdfs/journalnode/preprod not formatted
BUSINESS IMPACT: Customer stuck during upgrade process. Attempting to roll
back will not work either.
SUPPORT ANALYSIS: The issue was caused by section 3, step 4 where they had to
run
python upgradeHelper.py --hostname $HOSTNAME --user $USERNAME --password
$PASSWORD --clustername $CLUSTERNAME --fromStack=2.1 --toStack=2.2.x
--upgradeCatalog=UpgradeCatalog_2.1_to_2.2.x.json update-configs
They had a custom path for dfs.journalnode.edits.dir set to
/data/hadoop/hdfs/journal. The above changed that to /hadoop/hdfs/journalnode
meaning the JNs thought they were not formatted properly. There was no
warnings in Ambari to indicate an issue when they started the JNs.
STEPS TO REPRODUCE:
Starting with an HDP 2.1 Ambari installed cluster, change
dfs.journalnode.edits.dir from the default and set up NN HA. Then attempt to
follow upgrade instructions
<http://docs.hortonworks.com/HDPDocuments/Ambari-2.0.1.0/bk_upgrading_Ambari/c
ontent/_upgrading_the_hdp_stack_from_21_to_22.html>
to upgrade the HDP stack from 2.1 to 2.2.6.
> During HDP 2.1 to 2.2.6 upgrade dfs.journalnode.edits.dir is incorrectly
> changed
> --------------------------------------------------------------------------------
>
> Key: AMBARI-12230
> URL: https://issues.apache.org/jira/browse/AMBARI-12230
> Project: Ambari
> Issue Type: Bug
> Reporter: Andrew Onischuk
> Assignee: Andrew Onischuk
> Fix For: 2.1.0
>
>
> 2015-06-17 23:00:32,926 WARN ha.EditLogTailer
> (EditLogTailer.java:doWork(339)) - Edit log tailer interrupted
> java.lang.InterruptedException: sleep interrupted
> at java.lang.Thread.sleep(Native Method)
> at
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:337)
>
> at
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:282)
>
> at
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:299)
>
> at
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:412)
>
> at
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:295)
>
> 2015-06-17 23:00:32,930 INFO namenode.FSNamesystem
> (FSNamesystem.java:startActiveServices(1152)) - Starting services required
> for active state
> 2015-06-17 23:00:32,946 INFO client.QuorumJournalManager
> (QuorumJournalManager.java:recoverUnfinalizedSegments(435)) - Starting
> recovery process for unclosed journal segments...
> 2015-06-17 23:00:32,963 FATAL namenode.FSEditLog
> (JournalSet.java:mapJournalsAndReportErrors(398)) - Error:
> recoverUnfinalizedSegments failed for required journal
> (JournalAndStream(mgr=QJM to [10.222.32.220:8485, 10.222.32.214:8485,
> 10.222.32.216:8485], stream=null))
> org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many
> exceptions to achieve quorum size 2/3. 3 exceptions thrown:
> 10.222.32.220:8485: Journal Storage Directory
> /hadoop/hdfs/journalnode/preprod not formatted
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)