Takanobu Asanuma created HDFS-11179:
---------------------------------------
Summary: LightWeightHashSet can't remove blocks correctly which
have a large number blockId
Key: HDFS-11179
URL: https://issues.apache.org/jira/browse/HDFS-11179
Project: Hadoop HDFS
Issue Type: Bug
Components: namenode
Affects Versions: 3.0.0-alpha1
Reporter: Takanobu Asanuma
Assignee: Takanobu Asanuma
Priority: Blocker
Our test cluster has faced a problem that {{postponedMisreplicatedBlocksCount}}
has been going below zero. The version of the cluster is a recent 3.0. We
haven't created any EC files yet. This is the NN's log:
{noformat}
Rescan of postponedMisreplicatedBlocks completed in 13 msecs. 448 blocks are
left. 176 blocks are removed.
Rescan of postponedMisreplicatedBlocks completed in 13 msecs. 272 blocks are
left. 176 blocks are removed.
Rescan of postponedMisreplicatedBlocks completed in 14 msecs. 96 blocks are
left. 176 blocks are removed.
Rescan of postponedMisreplicatedBlocks completed in 327 msecs. -77 blocks are
left. 177 blocks are removed.
Rescan of postponedMisreplicatedBlocks completed in 15 msecs. -253 blocks are
left. 179 blocks are removed.
Rescan of postponedMisreplicatedBlocks completed in 14 msecs. -432 blocks are
left. 179 blocks are removed.
{noformat}
I looked into this issue and found that it is caused by {{LightWeightHashSet}}
which is used for {{postponedMisreplicatedBlocks}} recently. When
{{LightWeightHashSet}} remove blocks which have a large number blockId,
overflows happen and the blocks can't be removed correctly(, let alone ec
blocks whose blockId starts with the minimum of long).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]