[ 
https://issues.apache.org/jira/browse/HDFS-8574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14581674#comment-14581674
 ] 

Ajith S commented on HDFS-8574:
-------------------------------

Hi walter, thanks for that info. You are right, the number of RPCs is equal to 
number of volumes.
But in my scenario, there is one volume which contains files way more than 
{{dfs.blockreport.split.threshold}} (may be 10 times)

So the previous loop has created one report with all the blocklist on that 
volume
{code}
    for(Map.Entry<DatanodeStorage, BlockListAsLongs> kvPair : 
perVolumeBlockLists.entrySet()) {
      BlockListAsLongs blockList = kvPair.getValue();
      reports[i++] = new StorageBlockReport(kvPair.getKey(), blockList); 
      totalBlockCount += blockList.getNumberOfBlocks();
    }
{code}
So next, when it tries to send this block report to NN, it receives 
{code}
java.lang.IllegalStateException: 
com.google.protobuf.InvalidProtocolBufferException: Protocol message was too 
large.  May be malicious.  Use CodedInputStream.setSizeLimit() to increase the 
size limit.
        at 
org.apache.hadoop.hdfs.protocol.BlockListAsLongs$BufferDecoder$1.next(BlockListAsLongs.java:369)
        at 
org.apache.hadoop.hdfs.protocol.BlockListAsLongs$BufferDecoder$1.next(BlockListAsLongs.java:347)
        at 
org.apache.hadoop.hdfs.protocol.BlockListAsLongs$BufferDecoder.getBlockListAsLongs(BlockListAsLongs.java:325)
        at 
org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.blockReport(DatanodeProtocolClientSideTranslatorPB.java:190)
        at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.blockReport(BPServiceActor.java:473)
{code}

So may be we can redesign so that multiple block reports can be sent per 
volume.? what do you suggest.?

> When block count exceeds dfs.blockreport.split.threshold, the block report 
> are sent in one per message
> ------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-8574
>                 URL: https://issues.apache.org/jira/browse/HDFS-8574
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.7.0
>            Reporter: Ajith S
>            Assignee: Ajith S
>
> This piece of code in 
> {{org.apache.hadoop.hdfs.server.datanode.BPServiceActor.blockReport()}}
> {code}
> // Send one block report per message.
>         for (int r = 0; r < reports.length; r++) {
>           StorageBlockReport singleReport[] = { reports[r] };
>           DatanodeCommand cmd = bpNamenode.blockReport(
>               bpRegistration, bpos.getBlockPoolId(), singleReport,
>               new BlockReportContext(reports.length, r, reportId));
>           numReportsSent++;
>           numRPCs++;
>           if (cmd != null) {
>             cmds.add(cmd);
>           }
> {code}
> is creating many cmds in case the block count exceeds the 
> {{dfs.blockreport.split.threshold}} limit. A better way for this will be 
> spliting the block reports in equal number of buckets of size 
> {{dfs.blockreport.split.threshold}} therefore reducing the number of RPCs in 
> block reporting



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to