Mark Ormesher created HDFS-15103:
------------------------------------
Summary: JMX endpoint and "dfsadmin" report 1 corrupt block;
"fsck" reports 0
Key: HDFS-15103
URL: https://issues.apache.org/jira/browse/HDFS-15103
Project: Hadoop HDFS
Issue Type: Bug
Components: namenode
Affects Versions: 3.2.1
Environment: * CentOS 7
* HDFS 3.2.1
* 2x HA NNs
* 5x identical DNs
Reporter: Mark Ormesher
We're seeing a long-running discrepancy between the number of corrupted blocks
reported by the JMX endpoint and {{dfsadmin -report}} (1) and by {{fsck /}}
(0). This has persisted through rolling restarts of the NNs and DNs, and
through complete shutdowns for the HDFS cluster for unrelated maintenance.
{panel:title=JMX endpoint snippet}
{code}
(...)
"CorruptBlocks" : 1,
"ScheduledReplicationBlocks" : 0,
"PendingDeletionBlocks" : 0,
"LowRedundancyReplicatedBlocks" : 0,
"CorruptReplicatedBlocks" : 1,
"MissingReplicatedBlocks" : 0,
"MissingReplicationOneBlocks" : 0,
(...)
{code}
{panel}
{panel:title=dfsadmin -report}
{code}
$ ./hdfs dfsadmin -report | grep -i corrupt
Blocks with corrupt replicas: 1
Block groups with corrupt internal blocks: 0
{code}
{panel}
{panel:title=fsck /}
{code}
$ ./hdfs fsck / -files -blocks | grep -i corrupt
Corrupt blocks: 0
Corrupt block groups: 0
{code}
{panel}
I've read through the related tickets below, all of which suggest this issue
was resolved in 2.7.8, but we're seeing it in 3.2.1.
https://issues.apache.org/jira/browse/HDFS-8533
https://issues.apache.org/jira/browse/HDFS-10213
https://issues.apache.org/jira/browse/HDFS-13999
How can we work out whether we really do have a corrupt block, and if we do how
can we work out which block it is if {{fsck}} thinks everything is fine?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]