[ 
https://issues.apache.org/jira/browse/HDFS-9011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735775#comment-14735775
 ] 

Jing Zhao commented on HDFS-9011:
---------------------------------

Thanks for the review, Nicholas and Yi!

bq. for each partial report rpc, NN calls reportDiff(..) but reportDiff(..) 
assumes full block report. 

Yeah, this is a big issue here. The current reportDiff assumes the block report 
contains all the blocks in the storage thus removes all the blocks after the 
delimiter block. We can record the last block in the previous block report for 
the same storage as a cookie, but we cannot guarantee there is no block change 
happening during the two block report RPCs. For example, the cookie block may 
be deleted during the two reports. Thus looks like it is very hard to continue 
the reportDiff process across two FBR RPC, unless we link all the blocks for 
each storage in a specific order.

> Support splitting BlockReport of a storage into multiple RPC
> ------------------------------------------------------------
>
>                 Key: HDFS-9011
>                 URL: https://issues.apache.org/jira/browse/HDFS-9011
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Jing Zhao
>            Assignee: Jing Zhao
>         Attachments: HDFS-9011.000.patch, HDFS-9011.001.patch, 
> HDFS-9011.002.patch
>
>
> Currently if a DataNode has too many blocks (more than 1m by default), it 
> sends multiple RPC to the NameNode for the block report, each RPC contains 
> report for a single storage. However, in practice we've seen sometimes even a 
> single storage can contains large amount of blocks and the report even 
> exceeds the max RPC data length. It may be helpful to support sending 
> multiple RPC for the block report of a storage. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to