[
https://issues.apache.org/jira/browse/HDFS-13243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16617254#comment-16617254
]
Sunil Govindan commented on HDFS-13243:
---------------------------------------
As code freeze for 3.2 is crossed, moving this Jira to 3.3. Please feel free
to revert if anyone has concerns. Thank you.
> Get CorruptBlock because of calling close and sync in same time
> ---------------------------------------------------------------
>
> Key: HDFS-13243
> URL: https://issues.apache.org/jira/browse/HDFS-13243
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Affects Versions: 2.7.2, 3.2.0
> Reporter: Zephyr Guo
> Assignee: Zephyr Guo
> Priority: Critical
> Attachments: HDFS-13243-v1.patch, HDFS-13243-v2.patch,
> HDFS-13243-v3.patch, HDFS-13243-v4.patch, HDFS-13243-v5.patch,
> HDFS-13243-v6.patch
>
>
> HDFS File might get broken because of corrupt block(s) that could be produced
> by calling close and sync in the same time.
> When calling close was not successful, UCBlock status would change to
> COMMITTED, and if a sync request gets popped from queue and processed, sync
> operation would change the last block length.
> After that, DataNode would report all received block to NameNode, and will
> check Block length of all COMMITTED Blocks. But the block length was already
> different between recorded in NameNode memory and reported by DataNode, and
> consequently, the last block is marked as corruptted because of inconsistent
> length.
>
> {panel:title=Log in my hdfs}
> 2018-03-05 04:05:39,261 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
> allocate blk_1085498930_11758129\{UCState=UNDER_CONSTRUCTION,
> truncateBlock=null, primaryNodeIndex=-1,
> replicas=[ReplicaUC[[DISK]DS-32c7e479-3845-4a44-adf1-831edec7506b:NORMAL:10.0.0.219:50010|RBW],
>
> ReplicaUC[[DISK]DS-a9a5d653-c049-463d-8e4a-d1f0dc14409c:NORMAL:10.0.0.220:50010|RBW],
>
> ReplicaUC[[DISK]DS-f2b7c04a-b724-4c69-abbf-d2e416f70706:NORMAL:10.0.0.218:50010|RBW]]}
> for
> /hbase/WALs/hb-j5e517al6xib80rkb-006.hbase.rds.aliyuncs.com,16020,1519845790686/hb-j5e517al6xib80rkb-006.hbase.rds.aliyuncs.com%2C16020%2C1519845790686.default.1520193926515
> 2018-03-05 04:05:39,760 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
> fsync:
> /hbase/WALs/hb-j5e517al6xib80rkb-006.hbase.rds.aliyuncs.com,16020,1519845790686/hb-j5e517al6xib80rkb-006.hbase.rds.aliyuncs.com%2C16020%2C1519845790686.default.1520193926515
> for DFSClient_NONMAPREDUCE_1077513762_1
> 2018-03-05 04:05:39,761 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: BLOCK*
> blk_1085498930_11758129\{UCState=COMMITTED, truncateBlock=null,
> primaryNodeIndex=-1,
> replicas=[ReplicaUC[[DISK]DS-32c7e479-3845-4a44-adf1-831edec7506b:NORMAL:10.0.0.219:50010|RBW],
>
> ReplicaUC[[DISK]DS-a9a5d653-c049-463d-8e4a-d1f0dc14409c:NORMAL:10.0.0.220:50010|RBW],
>
> ReplicaUC[[DISK]DS-f2b7c04a-b724-4c69-abbf-d2e416f70706:NORMAL:10.0.0.218:50010|RBW]]}
> is not COMPLETE (ucState = COMMITTED, replication# = 0 < minimum = 2) in
> file
> /hbase/WALs/hb-j5e517al6xib80rkb-006.hbase.rds.aliyuncs.com,16020,1519845790686/hb-j5e517al6xib80rkb-006.hbase.rds.aliyuncs.com%2C16020%2C1519845790686.default.1520193926515
> 2018-03-05 04:05:39,761 INFO BlockStateChange: BLOCK* addStoredBlock:
> blockMap updated: 10.0.0.220:50010 is added to
> blk_1085498930_11758129\{UCState=COMMITTED, truncateBlock=null,
> primaryNodeIndex=-1,
> replicas=[ReplicaUC[[DISK]DS-32c7e479-3845-4a44-adf1-831edec7506b:NORMAL:10.0.0.219:50010|RBW],
>
> ReplicaUC[[DISK]DS-a9a5d653-c049-463d-8e4a-d1f0dc14409c:NORMAL:10.0.0.220:50010|RBW],
>
> ReplicaUC[[DISK]DS-f2b7c04a-b724-4c69-abbf-d2e416f70706:NORMAL:10.0.0.218:50010|RBW]]}
> size 2054413
> 2018-03-05 04:05:39,761 INFO BlockStateChange: BLOCK
> NameSystem.addToCorruptReplicasMap: blk_1085498930 added as corrupt on
> 10.0.0.219:50010 by
> hb-j5e517al6xib80rkb-006.hbase.rds.aliyuncs.com/10.0.0.219 because block is
> COMMITTED and reported length 2054413 does not match length in block map
> 141232
> 2018-03-05 04:05:39,762 INFO BlockStateChange: BLOCK
> NameSystem.addToCorruptReplicasMap: blk_1085498930 added as corrupt on
> 10.0.0.218:50010 by
> hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com/10.0.0.218 because block is
> COMMITTED and reported length 2054413 does not match length in block map
> 141232
> 2018-03-05 04:05:40,162 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: BLOCK*
> blk_1085498930_11758129\{UCState=COMMITTED, truncateBlock=null,
> primaryNodeIndex=-1,
> replicas=[ReplicaUC[[DISK]DS-32c7e479-3845-4a44-adf1-831edec7506b:NORMAL:10.0.0.219:50010|RBW],
>
> ReplicaUC[[DISK]DS-a9a5d653-c049-463d-8e4a-d1f0dc14409c:NORMAL:10.0.0.220:50010|RBW],
>
> ReplicaUC[[DISK]DS-f2b7c04a-b724-4c69-abbf-d2e416f70706:NORMAL:10.0.0.218:50010|RBW]]}
> is not COMPLETE (ucState = COMMITTED, replication# = 3 >= minimum = 2) in
> file
> /hbase/WALs/hb-j5e517al6xib80rkb-006.hbase.rds.aliyuncs.com,16020,1519845790686/hb-j5e517al6xib80rkb-006.hbase.rds.aliyuncs.com%2C16020%2C1519845790686.default.1520193926515
> {panel}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]