[
https://issues.apache.org/jira/browse/HDFS-7836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522480#comment-14522480
]
Colin Patrick McCabe commented on HDFS-7836:
--------------------------------------------
Hi [~xinwei],
The discussion on March 11th was focused on our proposal for off-heaping and
parallelizing the block manager from February 24th. We spent a lot of time
going through the proposal and responding to questions on the proposal.
There was widespread agreement that we needed to reduce the garbage collection
impact of the millions of BlockInfoContiguous structures. There was some
disagreement about how to do that. Daryn argued that using large primitive
arrays was the best way to go. Charles and I argued that using off-heap
storage was better.
The main advantage of large primitive arrays is that it makes the existing Java
\-Xmx memory settings work as expected. The main advantage of off-heap is that
it allows the use of things like {{Unsafe#compareAndSwap}}, which can often
lead to more efficient concurrent data structures. Also, when using off-heap
memory, we get to re-use malloc rather than essentially writing our own malloc
for every subsystem.
There was some hand-wringing about off-heap memory being slower, but I do not
believe that this is valid. Apache Spark has found that their off-heap hash
table was actually faster than the on-heap one, due to the ability to better
control the memory layout.
https://databricks.com/blog/2015/04/28/project-tungsten-bringing-spark-closer-to-bare-metal.html
The key is to avoid using {{DirectByteBuffer}}, which is rather slow, and use
{{Unsafe}} instead.
However, Daryn has posted some patches using the "large arrays" approach.
Since they are a nice incremental improvement, we are probably going to pick
them up if there are no blockers. We are also looking at incremental
improvements such as implementing backpressure for full block reports, and
speeding up edit log replay (if possible). I would also like to look at
parallelizing the full block report... if we can do that, we can get a dramatic
improvement in FBR times by using more than 1 core.
> BlockManager Scalability Improvements
> -------------------------------------
>
> Key: HDFS-7836
> URL: https://issues.apache.org/jira/browse/HDFS-7836
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Charles Lamb
> Assignee: Charles Lamb
> Attachments: BlockManagerScalabilityImprovementsDesign.pdf
>
>
> Improvements to BlockManager scalability.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)