Hello,
Since performance looks not bad after changing array size, today I have
tried to process my 60GB heap dump. Unfortunately, very fast I've ended
with exception:
Exception in thread "main" java.lang.IllegalArgumentException: range
-1293366370 is less than 0
at
org.apache.kato.common.SubsetDataProvider.<init>(SubsetDataProvider.java:29)
at
org.apache.kato.hprof.datalayer.HProfFile$HPROFRecordProvider.getCurrentElement(HProfFile.java:487)
at
org.apache.kato.hprof.datalayer.HProfFile$HPROFRecordProvider.getCurrentElement(HProfFile.java:437)
at
org.apache.kato.common.BitMaskMappingArray.get(BitMaskMappingArray.java:56)
at org.apache.kato.hprof.datalayer.HProfFile.getRecord(HProfFile.java:784)
... (my code) ...
It looks like range of HeapDumpHProfRecord had value: -1293366370.
Following is the part of the dump which I believes is responsible for
that (B2 E8 CB 9E):
0055DD84 25 00 00 2A │ B7 AE 22 06 │ 80 00 00 00 │ 01 00 00 2A │ B7 AE
20 0B │ 88 02 00 00 │ 00 00 00 00 │ 00 18 00 00 │ 23 26 00 00
0055DDA8 2A B7 B0 A1 │ 64 28 00 00 │ 00 01 00 00 │ 2A B7 B0 33 │ EA F8
0C 00 │ 00 00 00 B2 │ E8 CB 9E 20 │ 00 00 2A B7 │ B0 DE AF C8
0055DDCC 00 00 00 01 │ 00 00 2A B7 │ AE 22 0F 20 │ 00 00 2A AA │ BE 21
3D 28 │ 00 00 00 00 │ 00 00 00 00 │ 00 00 2A AA │ BE 25 B7 D8
What is interesting hprof file format reserves only 4 bytes for length
of a record, which allows to express record whose size is at most 4GB
(assuming unsigned 4 bytes).
My dump's version is: JAVA PROFILE 1.0.1
Regards
Lukasz
Lukasz wrote:
Hi Stuart, Steve,
I've taken deeper look into code. I still didn't trace carefully index
calculation in classes BitMaskMappingArray and
ArrayBitMaskMappingStrategy, but I managed to improve performance by
increasing arrays size in those classes (which is set in HProfFile
class).
If I understand code correctly, when capacity of BitMaskMappingArray
will be exhausted bucketSize is doubled, which in turn causes that
more reads (even cached) is required to set position of
IDataProvider/IArrayEntryProvider.
Following are loading time results for default array size (1000) and
increased (1000000). Test ran against generated dump file (5000000
instances of Data).
Default (1000):
HeapSubRecord: 100000 (866ms, 4215kB)
HeapSubRecord: 200000 (1716ms, 7879kB)
HeapSubRecord: 300000 (2833ms, 11263kB)
HeapSubRecord: 400000 (3889ms, 14283kB)
HeapSubRecord: 500000 (3893ms, 17319kB)
HeapSubRecord: 600000 (7248ms, 20479kB) (here probably buckedSize was
doubled)
HeapSubRecord: 700000 (7721ms, 23531kB)
HeapSubRecord: 800000 (7729ms, 26567kB)
HeapSubRecord: 900000 (7731ms, 29671kB)
HeapSubRecord: 1000000 (7704ms, 32731kB)
... (I didn't wait until end)
Increased(1000000):
HeapSubRecord: 100000 (622ms, 17809kB)
HeapSubRecord: 200000 (309ms, 20345kB)
HeapSubRecord: 300000 (283ms, 23861kB)
HeapSubRecord: 400000 (274ms, 27921kB)
HeapSubRecord: 500000 (269ms, 29957kB)
HeapSubRecord: 600000 (264ms, 31993kB)
HeapSubRecord: 700000 (272ms, 36097kB)
HeapSubRecord: 800000 (288ms, 37739kB)
HeapSubRecord: 900000 (263ms, 39835kB)
HeapSubRecord: 1000000 (259ms, 41931kB)
HeapSubRecord: 1100000 (300ms, 44773kB)
HeapSubRecord: 1200000 (283ms, 46901kB)
HeapSubRecord: 1300000 (291ms, 49029kB)
HeapSubRecord: 1400000 (328ms, 53801kB)
HeapSubRecord: 1500000 (259ms, 53801kB)
HeapSubRecord: 1600000 (272ms, 58125kB)
HeapSubRecord: 1700000 (264ms, 60293kB)
HeapSubRecord: 1800000 (264ms, 62473kB)
HeapSubRecord: 1900000 (361ms, 61373kB)
HeapSubRecord: 2000000 (274ms, 63105kB)
...
HeapSubRecord: 9000000 (284ms, 231969kB)
HeapSubRecord: 9100000 (272ms, 233597kB)
HeapSubRecord: 9200000 (281ms, 236357kB)
HeapSubRecord: 9300000 (274ms, 240469kB)
HeapSubRecord: 9400000 (279ms, 244541kB)
HeapSubRecord: 9500000 (269ms, 246549kB)
HeapSubRecord: 9600000 (279ms, 250565kB)
HeapSubRecord: 9700000 (265ms, 252573kB)
HeapSubRecord: 9800000 (279ms, 256629kB)
HeapSubRecord: 9900000 (265ms, 258669kB)
HeapSubRecord: 10000000 (463ms, 263997kB)
(end)
i.e. my 60GB dump file contains more than 1 100 000 000 of objects (if
I remember correctly).
Regards
Lukasz
Stuart Monteith wrote:
The hprof dump reader spends a lot of time reading the whole file,
for various reason.
The indices it has in memory are constructed through an initial read,
and this is also
the source of the memory usage. In addition, there is some
correlation to be done which
also takes up time, and induces yet more reading.
I'm sure some work could be done to improve the performance further,
but we'll have to
look at the tradeoff between diskspace and memory usage. The hprof
file format itself
is what it is, however, and we have no influence over that. The
CJVMTI agent is has lots of
room for improvement, but I suspect its potential for improvement is
unlikely to be much better
than existing hprof implementations. The built-in JVM hprof dumper
will probably be a hard act
to follow.
The HProf implementation is not thread-safe. Realistically, I think
it is something that ought to
be considered once things are more mature. There will be algorithms
that can deal with the JVM
structure sensible.
And thanks Lukasz, it's great to have your input.
Regards,
Stuart