[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

Colin P. McCabe (JIRA) Wed, 20 Jul 2016 10:58:44 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15386306#comment-15386306
 ]

Colin P. McCabe commented on HDFS-10301:
----------------------------------------

bq. [~redvine] asked: Colin P. McCabe Doesn't TCP ignore duplicate packets? Can 
you explain how this can happen? If the RPC does get duplicated, then we 
shouldn't return true right when node.leaseId == 0 ?

That is a fair point.  However, the retry logic in the RPC system could resend 
the message if the NN did not respond within a certain amount of time.  Or 
there could just be a bug which leads to the DN sending full block reports when 
it shouldn't.  In any case, we cannot assume that reordered messages are the 
problem.

bq. [~shv] wrote:  Also I think that Colin P. McCabe's veto, formulated as I am 
-1 on a patch which adds extra RPCs. is fully addressed now. The storage report 
was added to the last RPC representing a single block report. The last patch 
does not add extra RPCs.

Yes, this patch addresses my concerns.  I withdraw my -1.

bq. [~shv] wrote: The storage ids are already there in current BR protobuf. Why 
would you want a new field for that. You will need to duplicate all storage ids 
in case of full block report, when it is not split into multiple RPCs. Seems 
confusing and inefficient to me.

A new field would be best because we would avoid creating fake BlockListAsLong 
objects with length -1, and re-using protobuf fields for purposes they weren't 
intended for.  A list of storage IDs is not a block report or a list of blocks, 
and using the same data structures is very confusing.  If you want to optimize 
by not sending the list of storage reports separately when the block report has 
only one RPC, that's easy to do.  Just check if numRpcs == 1 and don't set or 
check the optional list of strings in that case.  I'm not going to block the 
patch over this, but I do think people reading this will wonder what you were 
thinking if you overload the PB fields in this way.

> BlockReport retransmissions may lead to storages falsely being declared 
> zombie if storage report processing happens out of order
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-10301
>                 URL: https://issues.apache.org/jira/browse/HDFS-10301
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.6.1
>            Reporter: Konstantin Shvachko
>            Assignee: Vinitha Reddy Gankidi
>            Priority: Critical
>         Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, 
> HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, 
> HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, 
> HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, 
> HDFS-10301.sample.patch, zombieStorageLogs.rtf
>
>
> When NameNode is busy a DataNode can timeout sending a block report. Then it 
> sends the block report again. Then NameNode while process these two reports 
> at the same time can interleave processing storages from different reports. 
> This screws up the blockReportId field, which makes NameNode think that some 
> storages are zombie. Replicas from zombie storages are immediately removed, 
> causing missing blocks.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10301) BlockReport retransmissions may lead to storages falsely being declared zombie if storage report processing happens out of order

Reply via email to