[ 
https://issues.apache.org/jira/browse/HDFS-7244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14179341#comment-14179341
 ] 

Colin Patrick McCabe commented on HDFS-7244:
--------------------------------------------

bq. I wasn't planning on introducing fallback. Instead I wanted to make the 
storage type configurable (in a similar way to your suggestion). If someone 
mis-configures and defines a direct byte buffer storages while using a JVM 
which does not support it, they'll get an error and the Namenode should not 
start.... My Slab implementation uses a single long for the address regardless 
of the storage type.

Sounds reasonable.

bq. My intention was to use integer or long ids for every entity in order to 
have a predictable size for a block, given its replication factor.  In 
HDFS-6660 I already added int ids to every DatanodeStorageInfo. 

Should we make HDFS-6660 a subtask of this JIRA?  It looks closely related.

bq. I've already shown that the only need for the linked list we have today 
(implemented by the "triplets") is to mark visited blocks when processing a 
block report and that could be done differently using a BitSet (see HDFS-6661 )

Hmm.  Interesting idea.  I think this might belong as a subtask as well.  It 
would also be nice to see some diagrams of how this would work (design doc?)  
It sounds like there might be some linear scans involved, so we'd need to 
verify that we weren't getting O(N) insert performance (or similar).

bq. A BlockInfo size is determined only by its replication factor.  So lookup 
will be - locate slab by replication factor (that could be an index in an array 
of slabs) and call get(id) on that slab.  The downside of this lookup idea is: 
Change to replication factor will require us to copy the data from one slab to 
another. A much higher cost than today.  We believe that such changes are not 
frequent at all and overall implementing it this way will be justified although 
we realise this is a key point the community will need to agree on if we move 
forward with this idea.

Interesting idea.  It neatly gets around the problem of variable numbers of 
replicas.  It seems like this could also implement an implicit linked list as 
we do today, in addition to storing an index into a bitfield as you proposed.

bq. I still didn't get my head round the use of blockPoolId in this lookup. You 
mention the lookup refers to the blockPoolId but I can't see it in the 
blocksMap impl. on trunk. I assume we can always use your suggestion of a table 
for String bpIds but maybe there's a better idea out there.

No, you're right.  Sorry, that was dumb of me.  {{BlocksMap}} doesn't need to 
deal with {{bpId}} because we have a {{BlocksMap}} per block pool ID.

bq. Slab usage example Is here.

I'll take a look.  I've only skimmed it so far, but it looks reasonable.  The 
key point is that we don't want to do "big deserialization up front."  We want 
to have getters that deserialize just what they're asked to get, which it looks 
like your example has.

I'd like to create a branch for this and make you guys branch committers.  It 
seems like we have a few changes that make the most sense in the context of 
off-heaping.  We can easily get those changes in and play around with stuff 
when using a branch.  When dealing with trunk, people can be more cautious.

I think a good goal for this branch would be off-heaping the NameNode's block 
map information.  If we can stay focused on that, we can create something that 
works great and then merge it to trunk.  It will be exciting to see the 
reduction in Java heap, and I think we will achieve a memory savings, based on 
some of the discussion here!  Great work, guys.

> Reduce Namenode memory using Flyweight pattern
> ----------------------------------------------
>
>                 Key: HDFS-7244
>                 URL: https://issues.apache.org/jira/browse/HDFS-7244
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>            Reporter: Amir Langer
>
> Using the flyweight pattern can dramatically reduce memory usage in the 
> Namenode. The pattern also abstracts the actual storage type and allows the 
> decision of whether it is off-heap or not and what is the serialisation 
> mechanism to be configured per deployment. 
> The idea is to move all BlockInfo data (as a first step) to this storage 
> using the Flyweight pattern. The cost to doing it will be in higher latency 
> when accessing/modifying a block. The idea is that this will be offset with a 
> reduction in memory and in the case of off-heap, a dramatic reduction in 
> memory (effectively, memory used for BlockInfo would reduce to a very small 
> constant value).
> This reduction will also have an huge impact on the latency as GC pauses will 
> be reduced considerably and may even end up with better latency results than 
> the original code.
> I wrote a stand-alone project as a proof of concept, to show the pattern, the 
> data structure we can use and what will be the performance costs of this 
> approach.
> see [Slab|https://github.com/langera/slab]
> and [Slab performance 
> results|https://github.com/langera/slab/wiki/Performance-Results].
> Slab abstracts the storage, gives several storage implementations and 
> implements the flyweight pattern for the application (Namenode in our case).
> The stages to incorporate Slab into the Namenode is outlined in the sub-tasks 
> JIRAs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to