Re: Processing huge heap dump.

Stuart Monteith Wed, 27 Jan 2010 05:22:00 -0800

Hi,

I've changed the code base to do as you have described. Havinglongs as parameters, of course, is necessary with large dumps. Iincreased the default values, but I found that the defaults yousuggested won't run on a relatively small machine.In the absence of a clever heuristic, I've added a couple of propertiesto adjust the array sizes.


org.apache.kato.hprof.maparraysize

This is the BitMaskMappingArray size for the whole file. Thedefault is 10000


org.apache.kato.hprof.heap.maparraysize

This is the heap record's BitMaskMappingArray size, the defaultis 100000



Thanks,
    Stuart

Lukasz wrote:

Hi Steve,

Actually I've made two changes.
First was increasing the size of array in classes BitMaskMappingArrayand ArrayBitMaskMappingStrategy, which is probably the same what youhave made.I had similar idea about using heuristic/prediction of how manyrecords can be in a dump.Other (or additional) solution could be increasing cacheSize insteadof bucketSize when cache will be exhausted.
I have filling that main purpose of those classes is providing mappingbetween "index of record" and "absolute position of the record in thefile".
And if my filling are correct, the mapping comprise of:
1) finding absolute location in the file
2) skipping a number of records (called "seek") before we set reader'sposition just on the beginning of required record.And the second step is the problem (I believe), because the biggerbucket we have, the more records we have to skip (methods called:"moveRelativeElement" are responsible).I think this, and the fact that bucketSize is doubled when cache isexhausted, are the reasons why performance drops over the time.
To fix it I changed following parts of code inorg.apache.kato.hprof.datalayer.HProfFile:
       public HeapDumpHProfRecord(short tag, IDataProvider dataProvider)
               throws IOException {
           ...
           subRecordArray = new BitMaskMappingArray(50000000, 25,
                   new HProfDumpRecordArrayProvider(dataProvider));
       }
and
   private void buildHeader() throws IOException {
       ...
array = new BitMaskMappingArray(1000000, 25, newHPROFRecordProvider(
               subset));

   }
Second problem was incorrect size of HeapDumpHProfRecord which wasread from the file (recordSize overflow problem).I didn't find general solution for this problem. I just assumed thatHeapDumpHProfRecord will end with the end of the file, which is truein my case.
To achieve this I made following changes (also in HProfFile):
               case 0x0c:
                   long l = recordProvider.dataRemaining();
IHeapDumpHProfRecord rec = newHeapDumpHProfRecord(tag,
                           new SubsetDataProvider(recordProvider, l));
                   recordProvider.moveBy(l);
                   return rec;
I also had to change signature of "moveBy" method to accept longinstead of int.
I'll try to send you part of my source code together with .svnmetadata, so you should be able to find what exactly I've changed.
Regards
Lukasz



Steve Poole wrote:
I've run some performance tests on various configs for HProfFilecaching.
See here
http://spreadsheets.google.com/pub?key=tes0VC-Wyz-lTe9sKNen2Hw&output=html
Our current setting for cache size and bucket size is 1000,25 -Bucket sizedoes effect the speed of analysis but its cache size that really doesit!We need to configure the cache size based on some heuristic about thesize
of the dump file.

On Thu, Jan 14, 2010 at 5:21 PM, Lukasz <flo...@intercel.com.pl> wrote:
Hi Steve, Stuart,

In last two days I was trying to move further.

I tried to prepare some workaround for incorrect length.
First idea was to utilize the fact that integer overflow probably took
place, so I tried to search for 0x2c just at overflow points(probably quitebrittle approach). But it turned out, that in my heap dumps (bothreal oneand prepared for development purposes) there is no 0x2c byteafter/at the
end of the dump.
I've noticed however that for both dumps the heap record ends at theend ofthe file. Probably it is not general rule, but for my purposes itshould beenough, so I changed kato sources slightly to utilize this anddecided to
move further.

So far I have some code which extracts static data from dump (strings,
classes) and searches for instances of given class.
I have run it against real dump. It took around 40 min (to getstatic data)
and 50 min (to find instances), which is not bad I think.
I didn't made careful reasoning about performance but CPU (not I/O)lookedto be a bottleneck. Performance also looked to be quite stableacross whole
dump.
60GB dump was extracted for core file using jmap (I don't rememberexactlyfrom which version of java this jmap came from, but for sure it wasJava
1.6).
Following JVM version was dumped to core file:
java version "1.6.0_10"
Java(TM) SE Runtime Environment (build 1.6.0_10-b33)
Java HotSpot(TM) 64-Bit Server VM (build 11.0-b15, mixed mode)

I was also suspecting that heap dump segments were introduced to handle
+4GB heap sizes, but it looks like sometimes it can not be a case.

Anyway, it is nice that I was able to go through whole dump without any
error.

Regards
Lukasz


Steve Poole wrote:
 Hi Lukasz,
You say your hprof format is 1.0.1 so , according to theinformation I
have about hprof format,  that means your won't have any heap dump
segments
in your file. Apparently they are in the 1.0.2 format which is aJava 6addition and is only used when the heap is too big for the oldformat to
cope with.

What JVM and OS are you using to produce this dump?

On Tue, Jan 12, 2010 at 6:17 PM, Stuart Monteith <stuk...@stoo.me.uk>
wrote:
Hi,
You are quite right, 4 bytes for the record length. If we had thefile
format wrong for 64 bit
hprof dumps we'd not be able to read any dump, so it must be anoverflow
of
the record length.
In that case, the only possible solution I can think of would befor us
to
change to code to search
for the HEAP DUMP END record (tagged 0x2c) to detect the end ofthe HEAP
DUMP.
Of course, there was some puzzling over the difference betweenHEAP DUMP
SEGMENT (0x1c) and HEAP DUMP
(0x0c) before.

I'd expect we'd get:

  HEAP DUMP
      heap dump records....
  HEAP DUMP END

or

  HEAP DUMP SEGMENT
      heap dump records
  HEAP DUMP SEGMENT
      heap dump records
     :      :
  HEAP DUMP END


If we ignore the record size for the HEAP DUMP and calculate it
ourselves,
we might have more luck.

Regards,
  Stuart



Lukasz wrote:
Hello,
Since performance looks not bad after changing array size, todayI havetried to process my 60GB heap dump. Unfortunately, very fast I'veended
with
exception:
Exception in thread "main" java.lang.IllegalArgumentException: range
-1293366370 is less than 0
at
org.apache.kato.common.SubsetDataProvider.<init>(SubsetDataProvider.java:29)
at
org.apache.kato.hprof.datalayer.HProfFile$HPROFRecordProvider.getCurrentElement(HProfFile.java:487)
at
org.apache.kato.hprof.datalayer.HProfFile$HPROFRecordProvider.getCurrentElement(HProfFile.java:437)
at
org.apache.kato.common.BitMaskMappingArray.get(BitMaskMappingArray.java:56)
at
org.apache.kato.hprof.datalayer.HProfFile.getRecord(HProfFile.java:784)
... (my code) ...

It looks like range of HeapDumpHProfRecord had value: -1293366370.
Following is the part of the dump which I believes is responsiblefor
that
(B2 E8 CB 9E):
0055DD84 25 00 00 2A │ B7 AE 22 06 │ 80 00 00 00 │ 01 00 00 2A │B7 AE
20
0B │ 88 02 00 00 │ 00 00 00 00 │ 00 18 00 00 │ 23 26 00 00
0055DDA8 2A B7 B0 A1 │ 64 28 00 00 │ 00 01 00 00 │ 2A B7 B0 33 │EA F8
0C
00 │ 00 00 00 B2 │ E8 CB 9E 20 │ 00 00 2A B7 │ B0 DE AF C8
0055DDCC 00 00 00 01 │ 00 00 2A B7 │ AE 22 0F 20 │ 00 00 2A AA │BE 21
3D
28 │ 00 00 00 00 │ 00 00 00 00 │ 00 00 2A AA │ BE 25 B7 D8
What is interesting hprof file format reserves only 4 bytes forlength
of
a record, which allows to express record whose size is at most 4GB
(assuming
unsigned 4 bytes).

My dump's version is: JAVA PROFILE 1.0.1

Regards
Lukasz


Lukasz wrote:
Hi Stuart, Steve,
I've taken deeper look into code. I still didn't trace carefullyindex
calculation in classes BitMaskMappingArray and
ArrayBitMaskMappingStrategy,
but I managed to improve performance by increasing arrays sizein those
classes (which is set in HProfFile class).
If I understand code correctly, when capacity ofBitMaskMappingArray
will
be exhausted bucketSize is doubled, which in turn causes that more
reads
(even cached) is required to set position of
IDataProvider/IArrayEntryProvider.
Following are loading time results for default array size (1000)and
increased (1000000). Test ran against generated dump file (5000000
instances
of Data).
Default (1000):
HeapSubRecord: 100000 (866ms, 4215kB)
HeapSubRecord: 200000 (1716ms, 7879kB)
HeapSubRecord: 300000 (2833ms, 11263kB)
HeapSubRecord: 400000 (3889ms, 14283kB)
HeapSubRecord: 500000 (3893ms, 17319kB)
HeapSubRecord: 600000 (7248ms, 20479kB) (here probablybuckedSize was
doubled)
HeapSubRecord: 700000 (7721ms, 23531kB)
HeapSubRecord: 800000 (7729ms, 26567kB)
HeapSubRecord: 900000 (7731ms, 29671kB)
HeapSubRecord: 1000000 (7704ms, 32731kB)
... (I didn't wait until end)

Increased(1000000):
HeapSubRecord: 100000 (622ms, 17809kB)
HeapSubRecord: 200000 (309ms, 20345kB)
HeapSubRecord: 300000 (283ms, 23861kB)
HeapSubRecord: 400000 (274ms, 27921kB)
HeapSubRecord: 500000 (269ms, 29957kB)
HeapSubRecord: 600000 (264ms, 31993kB)
HeapSubRecord: 700000 (272ms, 36097kB)
HeapSubRecord: 800000 (288ms, 37739kB)
HeapSubRecord: 900000 (263ms, 39835kB)
HeapSubRecord: 1000000 (259ms, 41931kB)
HeapSubRecord: 1100000 (300ms, 44773kB)
HeapSubRecord: 1200000 (283ms, 46901kB)
HeapSubRecord: 1300000 (291ms, 49029kB)
HeapSubRecord: 1400000 (328ms, 53801kB)
HeapSubRecord: 1500000 (259ms, 53801kB)
HeapSubRecord: 1600000 (272ms, 58125kB)
HeapSubRecord: 1700000 (264ms, 60293kB)
HeapSubRecord: 1800000 (264ms, 62473kB)
HeapSubRecord: 1900000 (361ms, 61373kB)
HeapSubRecord: 2000000 (274ms, 63105kB)
...
HeapSubRecord: 9000000 (284ms, 231969kB)
HeapSubRecord: 9100000 (272ms, 233597kB)
HeapSubRecord: 9200000 (281ms, 236357kB)
HeapSubRecord: 9300000 (274ms, 240469kB)
HeapSubRecord: 9400000 (279ms, 244541kB)
HeapSubRecord: 9500000 (269ms, 246549kB)
HeapSubRecord: 9600000 (279ms, 250565kB)
HeapSubRecord: 9700000 (265ms, 252573kB)
HeapSubRecord: 9800000 (279ms, 256629kB)
HeapSubRecord: 9900000 (265ms, 258669kB)
HeapSubRecord: 10000000 (463ms, 263997kB)
(end)
i.e. my 60GB dump file contains more than 1 100 000 000 ofobjects (if
I
remember correctly).

Regards
Lukasz


Stuart Monteith wrote:
The hprof dump reader spends a lot of time reading the wholefile, for
various reason.
The indices it has in memory are constructed through an initialread,
and this is also
the source of the memory usage. In addition, there is somecorrelation
to be done which
also takes up time, and induces yet more reading.
I'm sure some work could be done to improve the performancefurther,
but
we'll have to
look at the tradeoff between diskspace and memory usage. The hprof
file
format itself
is what it is, however, and we have no influence over that. TheCJVMTI
agent is has lots of
room for improvement, but I suspect its potential forimprovement is
unlikely to be much better
than existing hprof implementations. The built-in JVM hprof dumper
will
probably be a hard act
to follow.
The HProf implementation is not thread-safe. Realistically, Ithink it
is something that ought to
be considered once things are more mature. There will bealgorithms
that
can deal with the JVM
structure sensible.

And thanks Lukasz, it's great to have your input.

Regards,
Stuart
--
Stuart Monteith
http://blog.stoo.me.uk/


--
Stuart Monteith
http://blog.stoo.me.uk/

Re: Processing huge heap dump.

Reply via email to