[
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15247129#comment-15247129
]
Walter Su commented on HDFS-10301:
----------------------------------
Oh, I see. In this case, the reports are not splitted. And because the for-loop
is outside the lock, the 2 for-loops interleaved.
{code}
for (int r = 0; r < reports.length; r++) {
{code}
> Blocks removed by thousands due to falsely detected zombie storages
> -------------------------------------------------------------------
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Affects Versions: 2.6.1
> Reporter: Konstantin Shvachko
> Priority: Critical
> Attachments: zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it
> sends the block report again. Then NameNode while process these two reports
> at the same time can interleave processing storages from different reports.
> This screws up the blockReportId field, which makes NameNode think that some
> storages are zombie. Replicas from zombie storages are immediately removed,
> causing missing blocks.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)