[ 
https://issues.apache.org/jira/browse/AMBARI-12230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Onischuk updated AMBARI-12230:
-------------------------------------
    Description: 
    2015-06-17 23:00:32,926 WARN ha.EditLogTailer 
(EditLogTailer.java:doWork(339)) - Edit log tailer interrupted 
    java.lang.InterruptedException: sleep interrupted 
    at java.lang.Thread.sleep(Native Method) 
    at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:337)
 
    at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:282)
 
    at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:299)
 
    at 
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:412)
 
    at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:295)
 
    2015-06-17 23:00:32,930 INFO namenode.FSNamesystem 
(FSNamesystem.java:startActiveServices(1152)) - Starting services required for 
active state 
    2015-06-17 23:00:32,946 INFO client.QuorumJournalManager 
(QuorumJournalManager.java:recoverUnfinalizedSegments(435)) - Starting recovery 
process for unclosed journal segments... 
    2015-06-17 23:00:32,963 FATAL namenode.FSEditLog 
(JournalSet.java:mapJournalsAndReportErrors(398)) - Error: 
recoverUnfinalizedSegments failed for required journal 
(JournalAndStream(mgr=QJM to [10.222.32.220:8485, 10.222.32.214:8485, 
10.222.32.216:8485], stream=null)) 
    org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many 
exceptions to achieve quorum size 2/3. 3 exceptions thrown: 
    10.222.32.220:8485: Journal Storage Directory 
/hadoop/hdfs/journalnode/preprod not formatted 

  was:
PROBLEM: The customer was following the Ambari 2.0.1instructions for upgrading
the stack from HDP 2.1 to 2.2.6 found here:

<http://docs.hortonworks.com/HDPDocuments/Ambari-2.0.1.0/bk_upgrading_Ambari/c
ontent/_upgrading_the_hdp_stack_from_21_to_22.html>

When they tried to start the NN in section 3 (Complete the Upgrade), step 12
of those instructions it failed with the error

    
    
    2015-06-17 23:00:32,926 WARN ha.EditLogTailer 
(EditLogTailer.java:doWork(339)) - Edit log tailer interrupted 
    java.lang.InterruptedException: sleep interrupted 
    at java.lang.Thread.sleep(Native Method) 
    at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:337)
 
    at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:282)
 
    at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:299)
 
    at 
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:412)
 
    at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:295)
 
    2015-06-17 23:00:32,930 INFO namenode.FSNamesystem 
(FSNamesystem.java:startActiveServices(1152)) - Starting services required for 
active state 
    2015-06-17 23:00:32,946 INFO client.QuorumJournalManager 
(QuorumJournalManager.java:recoverUnfinalizedSegments(435)) - Starting recovery 
process for unclosed journal segments... 
    2015-06-17 23:00:32,963 FATAL namenode.FSEditLog 
(JournalSet.java:mapJournalsAndReportErrors(398)) - Error: 
recoverUnfinalizedSegments failed for required journal 
(JournalAndStream(mgr=QJM to [10.222.32.220:8485, 10.222.32.214:8485, 
10.222.32.216:8485], stream=null)) 
    org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many 
exceptions to achieve quorum size 2/3. 3 exceptions thrown: 
    10.222.32.220:8485: Journal Storage Directory 
/hadoop/hdfs/journalnode/preprod not formatted 
    

BUSINESS IMPACT: Customer stuck during upgrade process. Attempting to roll
back will not work either.

SUPPORT ANALYSIS: The issue was caused by section 3, step 4 where they had to
run

    
    
    python upgradeHelper.py --hostname $HOSTNAME --user $USERNAME --password 
$PASSWORD --clustername $CLUSTERNAME --fromStack=2.1 --toStack=2.2.x 
--upgradeCatalog=UpgradeCatalog_2.1_to_2.2.x.json update-configs
    

They had a custom path for dfs.journalnode.edits.dir set to
/data/hadoop/hdfs/journal. The above changed that to /hadoop/hdfs/journalnode
meaning the JNs thought they were not formatted properly. There was no
warnings in Ambari to indicate an issue when they started the JNs.

STEPS TO REPRODUCE:  
Starting with an HDP 2.1 Ambari installed cluster, change
dfs.journalnode.edits.dir from the default and set up NN HA. Then attempt to
follow upgrade instructions

<http://docs.hortonworks.com/HDPDocuments/Ambari-2.0.1.0/bk_upgrading_Ambari/c
ontent/_upgrading_the_hdp_stack_from_21_to_22.html>

to upgrade the HDP stack from 2.1 to 2.2.6.




> During HDP 2.1 to 2.2.6 upgrade dfs.journalnode.edits.dir is incorrectly 
> changed
> --------------------------------------------------------------------------------
>
>                 Key: AMBARI-12230
>                 URL: https://issues.apache.org/jira/browse/AMBARI-12230
>             Project: Ambari
>          Issue Type: Bug
>            Reporter: Andrew Onischuk
>            Assignee: Andrew Onischuk
>             Fix For: 2.1.0
>
>
>     2015-06-17 23:00:32,926 WARN ha.EditLogTailer 
> (EditLogTailer.java:doWork(339)) - Edit log tailer interrupted 
>     java.lang.InterruptedException: sleep interrupted 
>     at java.lang.Thread.sleep(Native Method) 
>     at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:337)
>  
>     at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:282)
>  
>     at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:299)
>  
>     at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:412)
>  
>     at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:295)
>  
>     2015-06-17 23:00:32,930 INFO namenode.FSNamesystem 
> (FSNamesystem.java:startActiveServices(1152)) - Starting services required 
> for active state 
>     2015-06-17 23:00:32,946 INFO client.QuorumJournalManager 
> (QuorumJournalManager.java:recoverUnfinalizedSegments(435)) - Starting 
> recovery process for unclosed journal segments... 
>     2015-06-17 23:00:32,963 FATAL namenode.FSEditLog 
> (JournalSet.java:mapJournalsAndReportErrors(398)) - Error: 
> recoverUnfinalizedSegments failed for required journal 
> (JournalAndStream(mgr=QJM to [10.222.32.220:8485, 10.222.32.214:8485, 
> 10.222.32.216:8485], stream=null)) 
>     org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many 
> exceptions to achieve quorum size 2/3. 3 exceptions thrown: 
>     10.222.32.220:8485: Journal Storage Directory 
> /hadoop/hdfs/journalnode/preprod not formatted 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to