[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

Kihwal Lee (JIRA) Wed, 03 Dec 2014 09:23:21 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14233210#comment-14233210
 ]


Kihwal Lee commented on HDFS-7435:
----------------------------------

bq. This is important because the problems of large contiguous array is also a 
problem for DataNode, since there are deployments that use 60 disks in a single 
node with more than 10 million blocks in a single DataNode.

Just to be clear, the datanode and namenode require the same size of contiguous 
memory for the backing arrays today for the ArrayLists used internally by 
protobuf.  So, this patch is not making the situation any worse.  But I agree 
that chunking is a good idea and is better to be done along with this feature.

> PB encoding of block reports is very inefficient
> ------------------------------------------------
>
>                 Key: HDFS-7435
>                 URL: https://issues.apache.org/jira/browse/HDFS-7435
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode, namenode
>    Affects Versions: 2.0.0-alpha, 3.0.0
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>            Priority: Critical
>         Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, HDFS-7435.patch
>
>
> Block reports are encoded as a PB repeating long.  Repeating fields use an 
> {{ArrayList}} with default capacity of 10.  A block report containing tens or 
> hundreds of thousand of longs (3 for each replica) is extremely expensive 
> since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
> fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient

Reply via email to