[ 
https://issues.apache.org/jira/browse/HDFS-7836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522480#comment-14522480
 ] 

Colin Patrick McCabe commented on HDFS-7836:
--------------------------------------------

Hi [~xinwei],

The discussion on March 11th was focused on our proposal for off-heaping and 
parallelizing the block manager from February 24th.  We spent a lot of time 
going through the proposal and responding to questions on the proposal.

There was widespread agreement that we needed to reduce the garbage collection 
impact of the millions of BlockInfoContiguous structures.  There was some 
disagreement about how to do that.  Daryn argued that using large primitive 
arrays was the best way to go.  Charles and I argued that using off-heap 
storage was better.

The main advantage of large primitive arrays is that it makes the existing Java 
\-Xmx memory settings work as expected.  The main advantage of off-heap is that 
it allows the use of things like {{Unsafe#compareAndSwap}}, which can often 
lead to more efficient concurrent data structures.  Also, when using off-heap 
memory, we get to re-use malloc rather than essentially writing our own malloc 
for every subsystem.

There was some hand-wringing about off-heap memory being slower, but I do not 
believe that this is valid.  Apache Spark has found that their off-heap hash 
table was actually faster than the on-heap one, due to the ability to better 
control the memory layout.  
https://databricks.com/blog/2015/04/28/project-tungsten-bringing-spark-closer-to-bare-metal.html
  The key is to avoid using {{DirectByteBuffer}}, which is rather slow, and use 
{{Unsafe}} instead.

However, Daryn has posted some patches using the "large arrays" approach.  
Since they are a nice incremental improvement, we are probably going to pick 
them up if there are no blockers.  We are also looking at incremental 
improvements such as implementing backpressure for full block reports, and 
speeding up edit log replay (if possible).  I would also like to look at 
parallelizing the full block report... if we can do that, we can get a dramatic 
improvement in FBR times by using more than 1 core.

> BlockManager Scalability Improvements
> -------------------------------------
>
>                 Key: HDFS-7836
>                 URL: https://issues.apache.org/jira/browse/HDFS-7836
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Charles Lamb
>            Assignee: Charles Lamb
>         Attachments: BlockManagerScalabilityImprovementsDesign.pdf
>
>
> Improvements to BlockManager scalability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to