[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15252377#comment-15252377
 ] 

Konstantin Shvachko commented on HDFS-10301:
--------------------------------------------

Hey Walter, your patch looks good by itself, but it does not address the bug in 
the zombie storage recognition.
Took me some time to review your patch, would have been easier if you explained 
your approach.
So your patch is reordering block reports for different storages in such a way 
that storages from the same report are placed as a contiguous segment in the 
block report queue, so that processing of different BRs is not interleaved. 
This addresses Daryn's comment rather than solving the reported bug, as BTW 
Daryn correctly stated.
If you want to go forward with reordering of BRs you should probably do it in 
another issue. I personally am not a supporter because
# It introduces an unnecessary restriction on the order of execution of block 
reports, and
# adds even more complexity to BR processing logic.

I see the main problem here that block reports used to be idempotent per 
storage, but HDFS-7960 made execution of a subsequent storage dependent on the 
state produced during execution of the previous ones. I think idempotent is 
good, and we should keep it. I think we can mitigate the problem by one of the 
following
# Changing the criteria of zombie storage recognition. Why should it depend on 
block report IDs?
# Eliminating the notion of zombie storage altogether. E.g., NN can DN to run 
{{DirectoryScanner}} if NN thinks DN's state is outdated.
# Try to move {{curBlockReportId}} from {{DatanodeDescriptor}} to 
{{StorageInfo}}, which will eliminate global state between storages.

Also if we cannot come up with a quick solution, then we should probably roll 
back HDFS-7960 for now and revisit it later, because this is a critical bug 
effecting all of our latest releases. And that is a lot of clusters and PBs out 
there.

> Blocks removed by thousands due to falsely detected zombie storages
> -------------------------------------------------------------------
>
>                 Key: HDFS-10301
>                 URL: https://issues.apache.org/jira/browse/HDFS-10301
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.6.1
>            Reporter: Konstantin Shvachko
>            Priority: Critical
>         Attachments: HDFS-10301.01.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to