[ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15259688#comment-15259688
 ] 

Konstantin Shvachko commented on HDFS-10301:
--------------------------------------------

??Maybe I'm misunderstanding the proposal, but don't we already do all of 
this???

Yes you misunderstood. This part is not my proposal. This is what we already 
do, and therefore I call them *Constraints*, because they complicate the 
*Problem*. The proposal is in the third bullet point titled *Approach*.

??What does the NameNode do if the DataNode is restarted while sending these 
RPCs, so that it never gets a chance to send all the storages that it claimed 
existed?  It seems like you will get stuck??

No, I will not get stuck. All br-RCPs are completely independent of each other. 
It's just that one of them has all storages, and indicates to the NameNode that 
it should update its storage list for the DataNode. NN processes as many of 
such RPCs, as DN sends. If the DN dies the NN will declare it dead in due time, 
or if DN restarts within 10 minutes it will send new set of block reports from 
scratch. I do not see any inconsistencies.

You can think of it as a new operation SyncStorages, which does just that - 
updates NameNode's knowledge of DN's storages. I combined this operation with 
the first br-RPC. One can combine it with any other call, same as you propose 
to combine it with the heartbeat. Except it seems a poor idea, since we don't 
want to wait for removal of thousands of replicas on a heartbeat.

??interleaved block reports are extremely rare??

You keep saying this. But it is not rare for me. Are you convincing me not to 
believe my eyes or that you checked the logs on your thousands of clusters? I 
did check mine.

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-10301
>                 URL: https://issues.apache.org/jira/browse/HDFS-10301
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.6.1
>            Reporter: Konstantin Shvachko
>            Assignee: Colin Patrick McCabe
>            Priority: Critical
>         Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.01.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to