[ 
https://issues.apache.org/jira/browse/HDFS-11044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDFS-11044:
-----------------------------
    Status: Patch Available  (was: Open)

Attach a patch to fix this.

> TestRollingUpgrade fails intermittently
> ---------------------------------------
>
>                 Key: HDFS-11044
>                 URL: https://issues.apache.org/jira/browse/HDFS-11044
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Yiqun Lin
>            Assignee: Yiqun Lin
>
> The test {{TestRollingUpgrade#testRollback}} fails intermittently in 
> trunk(https://builds.apache.org/job/PreCommit-HDFS-Build/17250/testReport/). 
> The stack info:
> {code}
> java.lang.AssertionError: Test resulted in an unexpected exit
>       at 
> org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1949)
>       at 
> org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1936)
>       at 
> org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:1929)
>       at 
> org.apache.hadoop.hdfs.TestRollingUpgrade.testRollback(TestRollingUpgrade.java:351)
> {code}
> I looked into that, it seems  there is some IOException happenning in writing 
> files to nn storages(Can see jenkins report). And then this exception will be 
> remenbered in {{ExitUtil.firstExitException}}. Finally when we do the 
> cluster's shutdown operations, this exception will be threw.
> The exception info:
> {code}
> 2016-10-21 12:54:02,300 [main] FATAL hdfs.MiniDFSCluster 
> (MiniDFSCluster.java:shutdown(1946)) - Test resulted in an unexpected exit
> org.apache.hadoop.util.ExitUtil$ExitException: java.io.IOException: All the 
> storage failed while writing properties to VERSION file
>       at 
> org.apache.hadoop.hdfs.server.namenode.NNStorage.writeAll(NNStorage.java:1151)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.updateStorageVersion(FSImage.java:999)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:850)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:240)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:149)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:838)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:819)
>       at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293)
>       at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427)
> {code}
> The IOException is beacause that all the sotrage dir have be removed. IMO, 
> one of the reason is that when we  writing some properties or write 
> transactionId to storage failed that lead the existing sotrage to be removed.
> In test {{TestRollingUpgrade#testRollback}} it will do many times for 
> restarting namenode operations, the underlying IO exceptions will be 
> happened. So I'm not sure if it's normal here. But one way that I am sure to 
> fix this: We can use {{checkExitOnShutdown(false)}} to skip the ExitException 
> check. And this have been done in 
> {{TestRollingUpgrade#testRollingUpgradeWithQJM}}. In addition, since that the 
> shutdown operation is the last operation in the test, it will not influence 
> the current logic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to