[ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14225666#comment-14225666
 ] 

Yi Liu commented on HDFS-7435:
------------------------------

Hi guys, I think this is a good improvement.
I also find a similar issue related to this when I'm doing Hadoop RPC related 
optimization in my local branch.
As we all known that the blocks report from DNs may become very large in big 
cluster, and there is chance to cause full GC if there is no enough contiguous 
space in old generation.

We know that we will reuse the connection for RPC calls, but when we process 
each rpc in the same connection, we will allocate a fresh heap byte buffer to 
store the rpc bytes data. The rpc message may be very large. So it will cause 
the same issue. 
My thought is to reuse the data buffer in the connection, I will open a new 
JIRA to track it.

> PB encoding of block reports is very inefficient
> ------------------------------------------------
>
>                 Key: HDFS-7435
>                 URL: https://issues.apache.org/jira/browse/HDFS-7435
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode, namenode
>    Affects Versions: 2.0.0-alpha, 3.0.0
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>            Priority: Critical
>         Attachments: HDFS-7435.000.patch, HDFS-7435.patch
>
>
> Block reports are encoded as a PB repeating long.  Repeating fields use an 
> {{ArrayList}} with default capacity of 10.  A block report containing tens or 
> hundreds of thousand of longs (3 for each replica) is extremely expensive 
> since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
> fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to