On 8/20/09 3:40 AM, "Steve Loughran" <[email protected]> wrote:
> > > does anyone have any up to date data on the memory consumption per > block/file on the NN on a 64-bit JVM with compressed pointers? > > The best documentation on consumption is > http://issues.apache.org/jira/browse/HADOOP-1687 -I'm just wondering if > anyone has looked at the memory footprint on the latest Hadoop releases, > on those latest JVMs? -and which JVM the numbers from HADOOP-1687 came from? > > Those compressed pointers (which BEA JRockit had for a while) save RAM > when the pointer references are within a couple of GB of the other refs, > and which are discussed in some papers > http://rappist.elis.ugent.be/~leeckhou/papers/cgo06.pdf > http://www.elis.ugent.be/~kvenster/papers/VenstermansKris_ORA.pdf > > sun's commentary is up here > http://wikis.sun.com/display/HotSpotInternals/CompressedOops > > I'm just not sure what it means for the NameNode, and as there is no > sizeof() operator in Java, something that will take a bit of effort to > work out. From what I read of the Sun wiki, when you go compressed, > while your heap is <3-4GB, there is no decompress operation; once you go > above that there is a shift and an add, which is probably faster than > fetching another 32 bits from $L2 or main RAM. The result could be > -could be- that your NN takes up much less space on 64 bit JVMs than it > did before, but is no slower. The implementation in JRE 6u14 uses a shift for all heap sizes, the optimization to remove that for heaps less than 4GB is not in the hotspot version there (but will be later). The size advantage is there either way. I have not tested an app myself that was not faster using -XX:+UseCompressedOops on a 64 bit JVM. The extra bit shifting is overshadowed by how much faster and less frequent GC is with a smaller dataset. > > Has anyone worked out the numbers yet? > > -steve > Every Java reference is 4 bytes instead of 8, and for several types -- arrays in particular -- the object is also 4 bytes smaller. Given that the NN data structures have plenty of references, a 30% reduction in memory used would not be a surprise. Collection classes in particular are near half the size.
