[ 
https://issues.apache.org/jira/browse/HBASE-20157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zephyr Guo resolved HBASE-20157.
--------------------------------
    Resolution: Fixed

> WAL file might get broken
> -------------------------
>
>                 Key: HBASE-20157
>                 URL: https://issues.apache.org/jira/browse/HBASE-20157
>             Project: HBase
>          Issue Type: Bug
>          Components: wal
>    Affects Versions: 1.1.0
>            Reporter: Zephyr Guo
>            Assignee: Zephyr Guo
>            Priority: Major
>             Fix For: 2.0.0
>
>
> WAL file can get corrupted by HBASE-16824. 
> When calling Writer.close() and Writer.sync() in the same time, a HDFS 
> bug(HDFS-13243) will be triggered. And, if this did happen, the last block in 
> WAL will get broken(NN mark it as CorruptBlock).
> My purpose of reporting this scenario here is to help those who come across 
> the same problem like me. (HBASE-16824 has been fixed, though) 
> {panel:title=RS log}
> 2018-02-05 07:58:54,212 INFO 
> [regionserver/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com/10.0.0.218:16020.logRoller]
>  hdfs.DFSClient: Could not complete 
> /hbase/WALs/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com,16020,1517453470107/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com%2C16020%2C1517453470107.default.1517788719683
>  retrying...
> 2018-02-05 07:59:00,612 INFO 
> [regionserver/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com/10.0.0.218:16020.logRoller]
>  hdfs.DFSClient: Could not complete 
> /hbase/WALs/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com,16020,1517453470107/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com%2C16020%2C1517453470107.default.1517788719683
>  retrying...
> {panel}
> {panel:title=NN log}
> 2018-02-05 07:58:48,011 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* 
> fsync: 
> /hbase/WALs/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com,16020,1517453470107/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com%2C16020%2C1517453470107.default.1517788719683
>  for DFSClient_NONMAPREDUCE_1109936977_1
> 2018-02-05 07:58:48,011 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: BLOCK* 
> blk_1080650145_6909339\{UCState=COMMITTED, truncateBlock=null, 
> primaryNodeIndex=-1, 
> replicas=[ReplicaUC[[DISK]DS-a4e579e7-4721-4c22-9b61-f1d00b33c45f:NORMAL:10.0.0.218:50010|RBW],
>  
> ReplicaUC[[DISK]DS-5d3d7878-876d-4a5a-97bc-5535c4cf8d59:NORMAL:10.0.0.220:50010|RBW],
>  
> ReplicaUC[[DISK]DS-ccc314b2-e2ad-4c1f-99a5-a39e3677a83b:NORMAL:10.0.0.221:50010|RBW]]}
>  is not COMPLETE (ucState = COMMITTED, replication# = 0 < minimum = 2) in 
> file 
> /hbase/WALs/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com,16020,1517453470107/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com%2C16020%2C1517453470107.default.1517788719683
> 2018-02-05 07:58:48,111 INFO BlockStateChange: BLOCK 
> NameSystem.addToCorruptReplicasMap: blk_1080650145 added as corrupt on 
> 10.0.0.221:50010 by 
> hb-j5e517al6xib80rkb-005.hbase.rds.aliyuncs.com/10.0.0.221 because block is 
> COMMITTED and reported length 1957330 does not match length in block map 80594
> 2018-02-05 07:58:48,224 INFO BlockStateChange: BLOCK 
> NameSystem.addToCorruptReplicasMap: blk_1080650145 added as corrupt on 
> 10.0.0.218:50010 by 
> hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com/10.0.0.218 because block is 
> COMMITTED and reported length 1957330 does not match length in block map 80594
> 2018-02-05 07:58:48,224 INFO BlockStateChange: BLOCK 
> NameSystem.addToCorruptReplicasMap: blk_1080650145 added as corrupt on 
> 10.0.0.220:50010 by 
> hb-j5e517al6xib80rkb-003.hbase.rds.aliyuncs.com/10.0.0.220 because block is 
> COMMITTED and reported length 1957330 does not match length in block map 80594
> 2018-02-05 07:58:48,511 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: BLOCK* 
> blk_1080650145_6909339\{UCState=COMMITTED, truncateBlock=null, 
> primaryNodeIndex=-1, 
> replicas=[ReplicaUC[[DISK]DS-a4e579e7-4721-4c22-9b61-f1d00b33c45f:NORMAL:10.0.0.218:50010|RBW],
>  
> ReplicaUC[[DISK]DS-5d3d7878-876d-4a5a-97bc-5535c4cf8d59:NORMAL:10.0.0.220:50010|RBW],
>  
> ReplicaUC[[DISK]DS-ccc314b2-e2ad-4c1f-99a5-a39e3677a83b:NORMAL:10.0.0.221:50010|RBW]]}
>  is not COMPLETE (ucState = COMMITTED, replication# = 3 >= minimum = 2) in 
> file 
> /hbase/WALs/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com,16020,1517453470107/hb-j5e517al6xib80rkb-004.hbase.rds.aliyuncs.com%2C16020%2C1517453470107.default.1517788719683
> {panel}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to