[ 
https://issues.apache.org/jira/browse/HDFS-5042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16019430#comment-16019430
 ] 

Hadoop QA commented on HDFS-5042:
---------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
26s{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 5 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
46s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
53s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
18s{color} | {color:green} branch-2 passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m  
0s{color} | {color:green} branch-2 passed with JDK v1.7.0_131 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
33s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
57s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
30s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
49s{color} | {color:green} branch-2 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
30s{color} | {color:green} branch-2 passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
52s{color} | {color:green} branch-2 passed with JDK v1.7.0_131 {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
15s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
12s{color} | {color:green} the patch passed with JDK v1.8.0_131 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  6m 
43s{color} | {color:green} the patch passed with JDK v1.7.0_131 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  6m 
43s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 30s{color} | {color:orange} root: The patch generated 2 new + 320 unchanged 
- 1 fixed = 322 total (was 321) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green}  0m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 1s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
14s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
43s{color} | {color:red} hadoop-common-project_hadoop-common-jdk1.8.0_131 with 
JDK v1.8.0_131 generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) 
{color} |
| {color:red}-1{color} | {color:red} javadoc {color} | {color:red}  0m 
50s{color} | {color:red} hadoop-common-project_hadoop-common-jdk1.7.0_131 with 
JDK v1.7.0_131 generated 1 new + 10 unchanged - 0 fixed = 11 total (was 10) 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  8m  
0s{color} | {color:green} hadoop-common in the patch passed with JDK 
v1.7.0_131. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 54m  3s{color} 
| {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_131. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
23s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}186m  1s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| JDK v1.8.0_131 Failed junit tests | 
hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes |
|   | hadoop.hdfs.server.namenode.TestDecommissioningStatus |
| JDK v1.7.0_131 Failed junit tests | 
hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes |
|   | hadoop.tracing.TestTraceAdmin |
|   | hadoop.hdfs.server.namenode.ha.TestDFSUpgradeWithHA |
|   | hadoop.hdfs.server.blockmanagement.TestReplicationPolicyWithUpgradeDomain 
|
| JDK v1.7.0_131 Timed out junit tests | 
org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:8515d35 |
| JIRA Issue | HDFS-5042 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12869210/HDFS-5042-branch-2-01.patch
 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 11b7df05d2bb 3.13.0-107-generic #154-Ubuntu SMP Tue Dec 20 
09:57:27 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | branch-2 / fe185e2 |
| Default Java | 1.7.0_131 |
| Multi-JDK versions |  /usr/lib/jvm/java-8-oracle:1.8.0_131 
/usr/lib/jvm/java-7-openjdk-amd64:1.7.0_131 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19538/artifact/patchprocess/diff-checkstyle-root.txt
 |
| javadoc | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19538/artifact/patchprocess/diff-javadoc-javadoc-hadoop-common-project_hadoop-common-jdk1.8.0_131.txt
 |
| javadoc | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19538/artifact/patchprocess/diff-javadoc-javadoc-hadoop-common-project_hadoop-common-jdk1.7.0_131.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19538/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs-jdk1.7.0_131.txt
 |
| JDK v1.7.0_131  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19538/testReport/ |
| modules | C: hadoop-common-project/hadoop-common 
hadoop-hdfs-project/hadoop-hdfs U: . |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/19538/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Completed files lost after power failure
> ----------------------------------------
>
>                 Key: HDFS-5042
>                 URL: https://issues.apache.org/jira/browse/HDFS-5042
>             Project: Hadoop HDFS
>          Issue Type: Bug
>         Environment: ext3 on CentOS 5.7 (kernel 2.6.18-274.el5)
>            Reporter: Dave Latham
>            Priority: Critical
>         Attachments: HDFS-5042-01.patch, HDFS-5042-branch-2-01.patch
>
>
> We suffered a cluster wide power failure after which HDFS lost data that it 
> had acknowledged as closed and complete.
> The client was HBase which compacted a set of HFiles into a new HFile, then 
> after closing the file successfully, deleted the previous versions of the 
> file.  The cluster then lost power, and when brought back up the newly 
> created file was marked CORRUPT.
> Based on reading the logs it looks like the replicas were created by the 
> DataNodes in the 'blocksBeingWritten' directory.  Then when the file was 
> closed they were moved to the 'current' directory.  After the power cycle 
> those replicas were again in the blocksBeingWritten directory of the 
> underlying file system (ext3).  When those DataNodes reported in to the 
> NameNode it deleted those replicas and lost the file.
> Some possible fixes could be having the DataNode fsync the directory(s) after 
> moving the block from blocksBeingWritten to current to ensure the rename is 
> durable or having the NameNode accept replicas from blocksBeingWritten under 
> certain circumstances.
> Log snippets from RS (RegionServer), NN (NameNode), DN (DataNode):
> {noformat}
> RS 2013-06-29 11:16:06,812 DEBUG org.apache.hadoop.hbase.util.FSUtils: 
> Creating 
> file=hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
>  with permission=rwxrwxrwx
> NN 2013-06-29 11:16:06,830 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> NameSystem.allocateBlock: 
> /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c.
>  blk_1395839728632046111_357084589
> DN 2013-06-29 11:16:06,832 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block 
> blk_1395839728632046111_357084589 src: /10.0.5.237:14327 dest: 
> /10.0.5.237:50010
> NN 2013-06-29 11:16:11,370 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> NameSystem.addStoredBlock: blockMap updated: 10.0.6.1:50010 is added to 
> blk_1395839728632046111_357084589 size 25418340
> NN 2013-06-29 11:16:11,370 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> NameSystem.addStoredBlock: blockMap updated: 10.0.6.24:50010 is added to 
> blk_1395839728632046111_357084589 size 25418340
> NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> NameSystem.addStoredBlock: blockMap updated: 10.0.5.237:50010 is added to 
> blk_1395839728632046111_357084589 size 25418340
> DN 2013-06-29 11:16:11,385 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Received block 
> blk_1395839728632046111_357084589 of size 25418340 from /10.0.5.237:14327
> DN 2013-06-29 11:16:11,385 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 2 for block 
> blk_1395839728632046111_357084589 terminating
> NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: Removing 
> lease on  file 
> /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
>  from client DFSClient_hb_rs_hs745,60020,1372470111932
> NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: DIR* 
> NameSystem.completeFile: file 
> /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
>  is closed by DFSClient_hb_rs_hs745,60020,1372470111932
> RS 2013-06-29 11:16:11,393 INFO org.apache.hadoop.hbase.regionserver.Store: 
> Renaming compacted file at 
> hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
>  to 
> hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/n/6e0cc30af6e64e56ba5a539fdf159c4c
> RS 2013-06-29 11:16:11,505 INFO org.apache.hadoop.hbase.regionserver.Store: 
> Completed major compaction of 7 file(s) in n of 
> users-6,\x12\xBDp\xA3,1359426311784.b5b0820cde759ae68e333b2f4015bb7e. into 
> 6e0cc30af6e64e56ba5a539fdf159c4c, size=24.2m; total size for store is 24.2m
> -------  CRASH, RESTART ---------
> NN 2013-06-29 12:01:19,743 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> NameSystem.addStoredBlock: addStoredBlock request received for 
> blk_1395839728632046111_357084589 on 10.0.6.1:50010 size 21978112 but was 
> rejected: Reported as block being written but is a block of closed file.
> NN 2013-06-29 12:01:19,743 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> NameSystem.addToInvalidates: blk_1395839728632046111 is added to invalidSet 
> of 10.0.6.1:50010
> NN 2013-06-29 12:01:20,155 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> NameSystem.addStoredBlock: addStoredBlock request received for 
> blk_1395839728632046111_357084589 on 10.0.5.237:50010 size 16971264 but was 
> rejected: Reported as block being written but is a block of closed file.
> NN 2013-06-29 12:01:20,155 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> NameSystem.addToInvalidates: blk_1395839728632046111 is added to invalidSet 
> of 10.0.5.237:50010
> NN 2013-06-29 12:01:20,175 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> NameSystem.addStoredBlock: addStoredBlock request received for 
> blk_1395839728632046111_357084589 on 10.0.6.24:50010 size 21913088 but was 
> rejected: Reported as block being written but is a block of closed file.
> NN 2013-06-29 12:01:20,175 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> NameSystem.addToInvalidates: blk_1395839728632046111 is added to invalidSet 
> of 10.0.6.24:50010
> (note the clock on the server running DN is wrong after restart.  I believe 
> timestamps are off by 6 hours:)
> DN 2013-06-29 06:07:22,877 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Scheduling block 
> blk_1395839728632046111_357084589 file 
> /data/hadoop/dfs/data/blocksBeingWritten/blk_1395839728632046111 for deletion
> DN 2013-06-29 06:07:24,952 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Deleted block 
> blk_1395839728632046111_357084589 at file 
> /data/hadoop/dfs/data/blocksBeingWritten/blk_1395839728632046111
> {noformat}
> There was some additional discussion on this thread on the mailing list:
> http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-user/201307.mbox/%3CCA+qbEUPuf19PL_EVeWi1104+scLVrcS0LTFUvBPw=qcuxnz...@mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to