The hprof dump reader spends a lot of time reading the whole file, for various reason. The indices it has in memory are constructed through an initial read, and this is also the source of the memory usage. In addition, there is some correlation to be done which
also takes up time, and induces yet more reading.

I'm sure some work could be done to improve the performance further, but we'll have to look at the tradeoff between diskspace and memory usage. The hprof file format itself is what it is, however, and we have no influence over that. The CJVMTI agent is has lots of room for improvement, but I suspect its potential for improvement is unlikely to be much better than existing hprof implementations. The built-in JVM hprof dumper will probably be a hard act
to follow.

The HProf implementation is not thread-safe. Realistically, I think it is something that ought to be considered once things are more mature. There will be algorithms that can deal with the JVM
structure sensible.

And thanks Lukasz, it's great to have your input.

Regards,
    Stuart


Steve Poole wrote:
Hi Lukaz - thanks for posting.

On Fri, Jan 8, 2010 at 7:11 PM, Lukasz<flo...@intercel.com.pl>  wrote:

Hello,

In my work I have faced problem where I have to process 60GB heap dump.
Probably it wasn't be nothing scary if I would have proper hardware to
process such file.
I've noticed that most of the tools for dump processing requires:
a) amount of memory at least equal to dump size
b) amount of free disk space at least equal to dump size (to create
indexes)
Unfortunately, I haven't access to machine which where both requirements
are meet at once.


Yes I agree -  for a) above I'd say that its common to need 1.5 times the
size of the original heap.


Processing I would like to perform is backtracking of references to
instances of one class (which causes out of memory). Assuming that hard disk
read will be my performance bottleneck, I should be able to backtrace few
levels during the night.
I have only raw overview of algorithm in my head, but it seems that
something like "visitor pattern" would be enough for me.

Since I have read about kato some time ago, I wanted to give it a try.

For dev purposes I have prepared ~370MB heap dump with around 10 000 000 of
simple object added to collection (which probably multiply amount of object
on heap).

Can you share the code you used to generate the data in the dumo?

1) Default approach:
Image image = FactoryRegistry.getDefaultRegistry().getImage(dump);

I was waiting few minutes, but it didn't finish processing, so it looks
that there will be no chance to process 60GB dump.

I suspect that the main reason why this is taking so long is that the HPROF
reader has to read all the dump first since it doesn't know what questions
you need answering,   That's generally true of any dump reader
unfortunately.

2) HProfFile
Fast look at HProfView class, give me some idea how I can visit all objects
(records) on a heap.

I wrote simple app which only iterate through all records, but it also
turned out to be quite slow and memory consuming. Following is some metrics:
---------------------------------
MemoryPool: PS Old Gen
Hello World!
heapDump:
org.apache.kato.hprof.datalayer.hproffile$heapdumphprofrec...@f0eed6
HeapSubRecord: 100000 (946ms, 4199kB)
HeapSubRecord: 200000 (2064ms, 7955kB)
HeapSubRecord: 300000 (3123ms, 11759kB)
HeapSubRecord: 400000 (3933ms, 14811kB)
HeapSubRecord: 500000 (3908ms, 17927kB)
HeapSubRecord: 600000 (7269ms, 21039kB)
HeapSubRecord: 700000 (7736ms, 24139kB)
HeapSubRecord: 800000 (7866ms, 27147kB)
HeapSubRecord: 900000 (7753ms, 30263kB)
HeapSubRecord: 1000000 (7684ms, 33299kB)
HeapSubRecord: 1100000 (13515ms, 36487kB)
HeapSubRecord: 1200000 (15525ms, 39623kB)
HeapSubRecord: 1300000 (15405ms, 42723kB)
HeapSubRecord: 1400000 (15567ms, 39115kB)
HeapSubRecord: 1500000 (15459ms, 42203kB)
HeapSubRecord: 1600000 (15692ms, 43838kB)
HeapSubRecord: 1700000 (15424ms, 45926kB)
HeapSubRecord: 1800000 (15327ms, 49026kB)
HeapSubRecord: 1900000 (15416ms, 48505kB)
HeapSubRecord: 2000000 (15352ms, 51629kB)
-------------------------------
It means that iterating over first 100 000 of records took 946ms and 4199kB
of OldGen was consumed.
Iterating over next 100 000 of records took 2064ms and 7955kB of OldGen was
consumed.
And so on, 100 000 of records is the interval for printing stats.

One core of cpu was saturated. It also looks like required amount of memory
will be equal to dump size.
I could start 4 threads to make better utilization of CPU, but since it
looks like HProfFile instance is not thread safe I would have to create 4
instances of HProfFile, which means that required amount of memory will be
like 4 x dumpSize.

That's all I made so far. I didn't track what in HProfFile consumes CPU and
memory, my blind guess is that CachedRandomAccesDataProvider is involved.

Thanks for this Luksz -   you are probably the first person to use this
code other than the developers and its great to get some feedback.  Can you
share the  code you used to create the dump and to visit the HPROF records?
Stuart has made some performance adjustments to the hprof code and we'll see
if we can do better.

On the spec list we're discussing the basics of a "snapshot" dump concept
where only what you need gets dumped.    I wonder if the same idea could be
applied to opening a dump.   It would be great to know when reading a dump
that certain information is not required -  that should improve
performance.



Regards
Lukasz




--
Stuart Monteith
http://blog.stoo.me.uk/

Reply via email to