[ https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Daryn Sharp updated HDFS-7435: ------------------------------ Attachment: HDFS-7435.patch Test failed because I stubbed the simulated dataset to not return reports... Fixed. [~jingzhao], please review. We'd like to add this to our internal builds to help alleviate BR processing issues. We also want to leverage this change to speed up rolling upgrades by dumping/reading the encoded BR to disk which this make trivial to do. > PB encoding of block reports is very inefficient > ------------------------------------------------ > > Key: HDFS-7435 > URL: https://issues.apache.org/jira/browse/HDFS-7435 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, namenode > Affects Versions: 2.0.0-alpha, 3.0.0 > Reporter: Daryn Sharp > Assignee: Daryn Sharp > Priority: Critical > Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, > HDFS-7435.002.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, > HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch, HDFS-7435.patch > > > Block reports are encoded as a PB repeating long. Repeating fields use an > {{ArrayList}} with default capacity of 10. A block report containing tens or > hundreds of thousand of longs (3 for each replica) is extremely expensive > since the {{ArrayList}} must realloc many times. Also, decoding repeating > fields will box the primitive longs which must then be unboxed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)