[jira] [Commented] (HDFS-6709) Implement off-heap data structures for NameNode and other HDFS memory optimization

Colin Patrick McCabe (JIRA) Thu, 24 Jul 2014 12:03:54 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-6709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14073529#comment-14073529
 ]


Colin Patrick McCabe commented on HDFS-6709:
--------------------------------------------

bq. I thought RTTI is per class, not instance? If yes, the savings are 
immaterial?

RTTI has to be per-instance.  That is why you can pass around Object instances 
and cast them to whatever you want.  Java has to store this information 
somewhere (think about it).  If Java didn't store this, it would have no way to 
know whether the cast should succeed or not.  Then you would be in the same 
situation as in C, where you can cast something to something else and get 
random garbage bits.

bq. Using misaligned access may result in processor incompatibility, impact 
performance, introduces atomicity and CAS problems, concurrent access to 
adjacent misaligned memory in the cache line may be completely unsafe.

I know about alignment restrictions.  There are easy ways around that problem-- 
instead of getLong you use two getShort calls, etc., depending on the minimum 
alignment you can rely on.  I don't see how CAS or atomicity are relevant, 
since we're not discussing atomic data structures.  The performance benefits of 
storing less data can often cancel out the performance disadvantages of doing 
unaligned access.  It depends on the scenario.

bq. No references, only primitives can be stored off-heap, so how do value 
types (non-boxed primitives, correct?) apply? Wouldn't the instance managing 
the slab have methods that return the correct primitive?

The point is that with control over the layout, you can do better.  I guess a 
more concrete example might help explain this.

bq. OO encapsulation and polymorphism are lost?

Take a look at {{BlockInfo#triplets}}.  How much OO encapsulation do you see in 
an array of Object[], with a special comment above about how to interpret each 
set of three entries?  Most of the places we'd like to use off-heap storage are 
already full of hacks to abuse the Java type system to squeeze in a few extra 
bytes.  Arrays of primitives, arrays of objects, with special conventions are 
routine.

bq. Does FooManager instantiate new Foo instances every time FooManager.get(id) 
is called? If yes, it generates a tremendous amount of garbage that defeats the 
GC benefit of going off heap.

No, because every modern GC uses "generational collection."  This means that 
short-lived instances are quickly cleaned up, without any pauses.

The rest of the questions seem to be variants on this one.  Think about it.  
All the code we have in FSNamesystem follows the pattern: lookup inode, do 
something to inode, done with inode.  We can create temporary INode objects and 
they'll never make it to PermGen, since they don't stick around between RPC 
calls.  Even if they somehow did (how?) with a dramatically smaller heap, the 
full GC would no longer be scary.  And we'd get other performance benefits like 
the compressed oops optimizations.  Anyway, the termporary inode objects would 
probably just be a thin objects which contain an offheap memory reference and a 
bunch of getters/setters, to avoid doing a lot of unnecessary serde.

> Implement off-heap data structures for NameNode and other HDFS memory 
> optimization
> ----------------------------------------------------------------------------------
>
>                 Key: HDFS-6709
>                 URL: https://issues.apache.org/jira/browse/HDFS-6709
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-6709.001.patch
>
>
> We should investigate implementing off-heap data structures for NameNode and 
> other HDFS memory optimization.  These data structures could reduce latency 
> by avoiding the long GC times that occur with large Java heaps.  We could 
> also avoid per-object memory overheads and control memory layout a little bit 
> better.  This also would allow us to use the JVM's "compressed oops" 
> optimization even with really large namespaces, if we could get the Java heap 
> below 32 GB for those cases.  This would provide another performance and 
> memory efficiency boost.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6709) Implement off-heap data structures for NameNode and other HDFS memory optimization

Reply via email to