[jira] [Updated] (HDFS-7435) PB encoding of block reports is very inefficient

Daryn Sharp (JIRA) Wed, 11 Mar 2015 09:20:11 -0700

     [ 
https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Daryn Sharp updated HDFS-7435:
------------------------------
    Attachment: HDFS-7435.patch

Test failed because I stubbed the simulated dataset to not return reports...  
Fixed.

[~jingzhao], please review.  We'd like to add this to our internal builds to 
help alleviate BR processing issues.  We also want to leverage this change to 
speed up rolling upgrades by dumping/reading the encoded BR to disk which this 
make trivial to do.

> PB encoding of block reports is very inefficient
> ------------------------------------------------
>
>                 Key: HDFS-7435
>                 URL: https://issues.apache.org/jira/browse/HDFS-7435
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode, namenode
>    Affects Versions: 2.0.0-alpha, 3.0.0
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>            Priority: Critical
>         Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, 
> HDFS-7435.002.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, 
> HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch
>
>
> Block reports are encoded as a PB repeating long.  Repeating fields use an 
> {{ArrayList}} with default capacity of 10.  A block report containing tens or 
> hundreds of thousand of longs (3 for each replica) is extremely expensive 
> since the {{ArrayList}} must realloc many times.  Also, decoding repeating 
> fields will box the primitive longs which must then be unboxed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HDFS-7435) PB encoding of block reports is very inefficient

Reply via email to