[
https://issues.apache.org/jira/browse/HDFS-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14073529#comment-14073529
]
Colin Patrick McCabe commented on HDFS-6709:
--------------------------------------------
bq. I thought RTTI is per class, not instance? If yes, the savings are
immaterial?
RTTI has to be per-instance. That is why you can pass around Object instances
and cast them to whatever you want. Java has to store this information
somewhere (think about it). If Java didn't store this, it would have no way to
know whether the cast should succeed or not. Then you would be in the same
situation as in C, where you can cast something to something else and get
random garbage bits.
bq. Using misaligned access may result in processor incompatibility, impact
performance, introduces atomicity and CAS problems, concurrent access to
adjacent misaligned memory in the cache line may be completely unsafe.
I know about alignment restrictions. There are easy ways around that problem--
instead of getLong you use two getShort calls, etc., depending on the minimum
alignment you can rely on. I don't see how CAS or atomicity are relevant,
since we're not discussing atomic data structures. The performance benefits of
storing less data can often cancel out the performance disadvantages of doing
unaligned access. It depends on the scenario.
bq. No references, only primitives can be stored off-heap, so how do value
types (non-boxed primitives, correct?) apply? Wouldn't the instance managing
the slab have methods that return the correct primitive?
The point is that with control over the layout, you can do better. I guess a
more concrete example might help explain this.
bq. OO encapsulation and polymorphism are lost?
Take a look at {{BlockInfo#triplets}}. How much OO encapsulation do you see in
an array of Object[], with a special comment above about how to interpret each
set of three entries? Most of the places we'd like to use off-heap storage are
already full of hacks to abuse the Java type system to squeeze in a few extra
bytes. Arrays of primitives, arrays of objects, with special conventions are
routine.
bq. Does FooManager instantiate new Foo instances every time FooManager.get(id)
is called? If yes, it generates a tremendous amount of garbage that defeats the
GC benefit of going off heap.
No, because every modern GC uses "generational collection." This means that
short-lived instances are quickly cleaned up, without any pauses.
The rest of the questions seem to be variants on this one. Think about it.
All the code we have in FSNamesystem follows the pattern: lookup inode, do
something to inode, done with inode. We can create temporary INode objects and
they'll never make it to PermGen, since they don't stick around between RPC
calls. Even if they somehow did (how?) with a dramatically smaller heap, the
full GC would no longer be scary. And we'd get other performance benefits like
the compressed oops optimizations. Anyway, the termporary inode objects would
probably just be a thin objects which contain an offheap memory reference and a
bunch of getters/setters, to avoid doing a lot of unnecessary serde.
> Implement off-heap data structures for NameNode and other HDFS memory
> optimization
> ----------------------------------------------------------------------------------
>
> Key: HDFS-6709
> URL: https://issues.apache.org/jira/browse/HDFS-6709
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: namenode
> Reporter: Colin Patrick McCabe
> Assignee: Colin Patrick McCabe
> Attachments: HDFS-6709.001.patch
>
>
> We should investigate implementing off-heap data structures for NameNode and
> other HDFS memory optimization. These data structures could reduce latency
> by avoiding the long GC times that occur with large Java heaps. We could
> also avoid per-object memory overheads and control memory layout a little bit
> better. This also would allow us to use the JVM's "compressed oops"
> optimization even with really large namespaces, if we could get the Java heap
> below 32 GB for those cases. This would provide another performance and
> memory efficiency boost.
--
This message was sent by Atlassian JIRA
(v6.2#6252)