[ https://issues.apache.org/jira/browse/HDFS-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15386306#comment-15386306 ]
Colin P. McCabe commented on HDFS-10301: ---------------------------------------- bq. [~redvine] asked: Colin P. McCabe Doesn't TCP ignore duplicate packets? Can you explain how this can happen? If the RPC does get duplicated, then we shouldn't return true right when node.leaseId == 0 ? That is a fair point. However, the retry logic in the RPC system could resend the message if the NN did not respond within a certain amount of time. Or there could just be a bug which leads to the DN sending full block reports when it shouldn't. In any case, we cannot assume that reordered messages are the problem. bq. [~shv] wrote: Also I think that Colin P. McCabe's veto, formulated as I am -1 on a patch which adds extra RPCs. is fully addressed now. The storage report was added to the last RPC representing a single block report. The last patch does not add extra RPCs. Yes, this patch addresses my concerns. I withdraw my -1. bq. [~shv] wrote: The storage ids are already there in current BR protobuf. Why would you want a new field for that. You will need to duplicate all storage ids in case of full block report, when it is not split into multiple RPCs. Seems confusing and inefficient to me. A new field would be best because we would avoid creating fake BlockListAsLong objects with length -1, and re-using protobuf fields for purposes they weren't intended for. A list of storage IDs is not a block report or a list of blocks, and using the same data structures is very confusing. If you want to optimize by not sending the list of storage reports separately when the block report has only one RPC, that's easy to do. Just check if numRpcs == 1 and don't set or check the optional list of strings in that case. I'm not going to block the patch over this, but I do think people reading this will wonder what you were thinking if you overload the PB fields in this way. > BlockReport retransmissions may lead to storages falsely being declared > zombie if storage report processing happens out of order > -------------------------------------------------------------------------------------------------------------------------------- > > Key: HDFS-10301 > URL: https://issues.apache.org/jira/browse/HDFS-10301 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode > Affects Versions: 2.6.1 > Reporter: Konstantin Shvachko > Assignee: Vinitha Reddy Gankidi > Priority: Critical > Attachments: HDFS-10301.002.patch, HDFS-10301.003.patch, > HDFS-10301.004.patch, HDFS-10301.005.patch, HDFS-10301.006.patch, > HDFS-10301.007.patch, HDFS-10301.008.patch, HDFS-10301.009.patch, > HDFS-10301.01.patch, HDFS-10301.010.patch, HDFS-10301.011.patch, > HDFS-10301.sample.patch, zombieStorageLogs.rtf > > > When NameNode is busy a DataNode can timeout sending a block report. Then it > sends the block report again. Then NameNode while process these two reports > at the same time can interleave processing storages from different reports. > This screws up the blockReportId field, which makes NameNode think that some > storages are zombie. Replicas from zombie storages are immediately removed, > causing missing blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org