[
https://issues.apache.org/jira/browse/HDFS-7244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14179341#comment-14179341
]
Colin Patrick McCabe commented on HDFS-7244:
--------------------------------------------
bq. I wasn't planning on introducing fallback. Instead I wanted to make the
storage type configurable (in a similar way to your suggestion). If someone
mis-configures and defines a direct byte buffer storages while using a JVM
which does not support it, they'll get an error and the Namenode should not
start.... My Slab implementation uses a single long for the address regardless
of the storage type.
Sounds reasonable.
bq. My intention was to use integer or long ids for every entity in order to
have a predictable size for a block, given its replication factor. In
HDFS-6660 I already added int ids to every DatanodeStorageInfo.
Should we make HDFS-6660 a subtask of this JIRA? It looks closely related.
bq. I've already shown that the only need for the linked list we have today
(implemented by the "triplets") is to mark visited blocks when processing a
block report and that could be done differently using a BitSet (see HDFS-6661 )
Hmm. Interesting idea. I think this might belong as a subtask as well. It
would also be nice to see some diagrams of how this would work (design doc?)
It sounds like there might be some linear scans involved, so we'd need to
verify that we weren't getting O(N) insert performance (or similar).
bq. A BlockInfo size is determined only by its replication factor. So lookup
will be - locate slab by replication factor (that could be an index in an array
of slabs) and call get(id) on that slab. The downside of this lookup idea is:
Change to replication factor will require us to copy the data from one slab to
another. A much higher cost than today. We believe that such changes are not
frequent at all and overall implementing it this way will be justified although
we realise this is a key point the community will need to agree on if we move
forward with this idea.
Interesting idea. It neatly gets around the problem of variable numbers of
replicas. It seems like this could also implement an implicit linked list as
we do today, in addition to storing an index into a bitfield as you proposed.
bq. I still didn't get my head round the use of blockPoolId in this lookup. You
mention the lookup refers to the blockPoolId but I can't see it in the
blocksMap impl. on trunk. I assume we can always use your suggestion of a table
for String bpIds but maybe there's a better idea out there.
No, you're right. Sorry, that was dumb of me. {{BlocksMap}} doesn't need to
deal with {{bpId}} because we have a {{BlocksMap}} per block pool ID.
bq. Slab usage example Is here.
I'll take a look. I've only skimmed it so far, but it looks reasonable. The
key point is that we don't want to do "big deserialization up front." We want
to have getters that deserialize just what they're asked to get, which it looks
like your example has.
I'd like to create a branch for this and make you guys branch committers. It
seems like we have a few changes that make the most sense in the context of
off-heaping. We can easily get those changes in and play around with stuff
when using a branch. When dealing with trunk, people can be more cautious.
I think a good goal for this branch would be off-heaping the NameNode's block
map information. If we can stay focused on that, we can create something that
works great and then merge it to trunk. It will be exciting to see the
reduction in Java heap, and I think we will achieve a memory savings, based on
some of the discussion here! Great work, guys.
> Reduce Namenode memory using Flyweight pattern
> ----------------------------------------------
>
> Key: HDFS-7244
> URL: https://issues.apache.org/jira/browse/HDFS-7244
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: namenode
> Reporter: Amir Langer
>
> Using the flyweight pattern can dramatically reduce memory usage in the
> Namenode. The pattern also abstracts the actual storage type and allows the
> decision of whether it is off-heap or not and what is the serialisation
> mechanism to be configured per deployment.
> The idea is to move all BlockInfo data (as a first step) to this storage
> using the Flyweight pattern. The cost to doing it will be in higher latency
> when accessing/modifying a block. The idea is that this will be offset with a
> reduction in memory and in the case of off-heap, a dramatic reduction in
> memory (effectively, memory used for BlockInfo would reduce to a very small
> constant value).
> This reduction will also have an huge impact on the latency as GC pauses will
> be reduced considerably and may even end up with better latency results than
> the original code.
> I wrote a stand-alone project as a proof of concept, to show the pattern, the
> data structure we can use and what will be the performance costs of this
> approach.
> see [Slab|https://github.com/langera/slab]
> and [Slab performance
> results|https://github.com/langera/slab/wiki/Performance-Results].
> Slab abstracts the storage, gives several storage implementations and
> implements the flyweight pattern for the application (Namenode in our case).
> The stages to incorporate Slab into the Namenode is outlined in the sub-tasks
> JIRAs.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)