[
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15491507#comment-15491507
]
Arpit Agarwal edited comment on HDFS-10301 at 9/14/16 9:34 PM:
---------------------------------------------------------------
I don't think it is safe to remove storages (and hence block replicas from
memory) when the NameNode doesn't have up to date block replica state because
the block->storage mapping on the NameNode can be stale e.g. due to disk
balancer moving replicas; or due to the way VolumeChoosingPolicy picks storages
for new blocks.
was (Author: arpitagarwal):
I don't think it is safe to remove storages (and hence blocks) when the
NameNode doesn't have up to date block replica state because the block->storage
mapping on the NameNode can be stale e.g. due to disk balancer moving replicas;
or due to the way VolumeChoosingPolicy picks storages for new blocks.
> BlockReport retransmissions may lead to storages falsely being declared
> zombie if storage report processing happens out of order
> --------------------------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-10301
> URL: https://issues.apache.org/jira/browse/HDFS-10301
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Affects Versions: 2.6.1
> Reporter: Konstantin Shvachko
> Assignee: Vinitha Reddy Gankidi
> Priority: Critical
> Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch,
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch,
> HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch,
> HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch,
> HDFS-10301.012.patch, HDFS-10301.013.patch, HDFS-10301.014.patch,
> HDFS-10301.branch-2.7.patch, HDFS-10301.branch-2.patch,
> HDFS-10301.sample.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it
> sends the block report again. Then NameNode while process these two reports
> at the same time can interleave processing storages from different reports.
> This screws up the blockReportId field, which makes NameNode think that some
> storages are zombie. Replicas from zombie storages are immediately removed,
> causing missing blocks.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]