Hi,I've changed the code base to do as you have described. Having longs as parameters, of course, is necessary with large dumps. I increased the default values, but I found that the defaults you suggested won't run on a relatively small machine. In the absence of a clever heuristic, I've added a couple of properties to adjust the array sizes.
org.apache.kato.hprof.maparraysizeThis is the BitMaskMappingArray size for the whole file. The default is 10000
org.apache.kato.hprof.heap.maparraysizeThis is the heap record's BitMaskMappingArray size, the default is 100000
Thanks, Stuart Lukasz wrote:
Hi Steve, Actually I've made two changes.First was increasing the size of array in classes BitMaskMappingArray and ArrayBitMaskMappingStrategy, which is probably the same what you have made. I had similar idea about using heuristic/prediction of how many records can be in a dump. Other (or additional) solution could be increasing cacheSize instead of bucketSize when cache will be exhausted.I have filling that main purpose of those classes is providing mapping between "index of record" and "absolute position of the record in the file".And if my filling are correct, the mapping comprise of: 1) finding absolute location in the file2) skipping a number of records (called "seek") before we set reader's position just on the beginning of required record. And the second step is the problem (I believe), because the bigger bucket we have, the more records we have to skip (methods called: "moveRelativeElement" are responsible). I think this, and the fact that bucketSize is doubled when cache is exhausted, are the reasons why performance drops over the time.To fix it I changed following parts of code in org.apache.kato.hprof.datalayer.HProfFile:public HeapDumpHProfRecord(short tag, IDataProvider dataProvider) throws IOException { ... subRecordArray = new BitMaskMappingArray(50000000, 25, new HProfDumpRecordArrayProvider(dataProvider)); } and private void buildHeader() throws IOException { ...array = new BitMaskMappingArray(1000000, 25, new HPROFRecordProvider(subset)); }Second problem was incorrect size of HeapDumpHProfRecord which was read from the file (recordSize overflow problem). I didn't find general solution for this problem. I just assumed that HeapDumpHProfRecord will end with the end of the file, which is true in my case.To achieve this I made following changes (also in HProfFile): case 0x0c: long l = recordProvider.dataRemaining();IHeapDumpHProfRecord rec = new HeapDumpHProfRecord(tag,new SubsetDataProvider(recordProvider, l)); recordProvider.moveBy(l); return rec;I also had to change signature of "moveBy" method to accept long instead of int.I'll try to send you part of my source code together with .svn metadata, so you should be able to find what exactly I've changed.Regards Lukasz Steve Poole wrote:I've run some performance tests on various configs for HProfFile caching.See herehttp://spreadsheets.google.com/pub?key=tes0VC-Wyz-lTe9sKNen2Hw&output=htmlOur current setting for cache size and bucket size is 1000,25 - Bucket size does effect the speed of analysis but its cache size that really does it! We need to configure the cache size based on some heuristic about the sizeof the dump file. On Thu, Jan 14, 2010 at 5:21 PM, Lukasz <flo...@intercel.com.pl> wrote:Hi Steve, Stuart, In last two days I was trying to move further. I tried to prepare some workaround for incorrect length. First idea was to utilize the fact that integer overflow probably tookplace, so I tried to search for 0x2c just at overflow points (probably quite brittle approach). But it turned out, that in my heap dumps (both real one and prepared for development purposes) there is no 0x2c byte after/at theend of the dump.I've noticed however that for both dumps the heap record ends at the end of the file. Probably it is not general rule, but for my purposes it should be enough, so I changed kato sources slightly to utilize this and decided tomove further. So far I have some code which extracts static data from dump (strings, classes) and searches for instances of given class.I have run it against real dump. It took around 40 min (to get static data)and 50 min (to find instances), which is not bad I think.I didn't made careful reasoning about performance but CPU (not I/O) looked to be a bottleneck. Performance also looked to be quite stable across wholedump.60GB dump was extracted for core file using jmap (I don't remember exactly from which version of java this jmap came from, but for sure it was Java1.6). Following JVM version was dumped to core file: java version "1.6.0_10" Java(TM) SE Runtime Environment (build 1.6.0_10-b33) Java HotSpot(TM) 64-Bit Server VM (build 11.0-b15, mixed mode) I was also suspecting that heap dump segments were introduced to handle +4GB heap sizes, but it looks like sometimes it can not be a case. Anyway, it is nice that I was able to go through whole dump without any error. Regards Lukasz Steve Poole wrote:Hi Lukasz,You say your hprof format is 1.0.1 so , according to the information Ihave about hprof format, that means your won't have any heap dump segmentsin your file. Apparently they are in the 1.0.2 format which is a Java 6 addition and is only used when the heap is too big for the old format tocope with. What JVM and OS are you using to produce this dump? On Tue, Jan 12, 2010 at 6:17 PM, Stuart Monteith <stuk...@stoo.me.uk> wrote:Hi,You are quite right, 4 bytes for the record length. If we had the fileformat wrong for 64 bithprof dumps we'd not be able to read any dump, so it must be an overflowof the record length.In that case, the only possible solution I can think of would be for usto change to code to searchfor the HEAP DUMP END record (tagged 0x2c) to detect the end of the HEAPDUMP.Of course, there was some puzzling over the difference between HEAP DUMPSEGMENT (0x1c) and HEAP DUMP (0x0c) before. I'd expect we'd get: HEAP DUMP heap dump records.... HEAP DUMP END or HEAP DUMP SEGMENT heap dump records HEAP DUMP SEGMENT heap dump records : : HEAP DUMP END If we ignore the record size for the HEAP DUMP and calculate it ourselves, we might have more luck. Regards, Stuart Lukasz wrote:Hello,Since performance looks not bad after changing array size, today I have tried to process my 60GB heap dump. Unfortunately, very fast I've endedwith exception: Exception in thread "main" java.lang.IllegalArgumentException: range -1293366370 is less than 0 atorg.apache.kato.common.SubsetDataProvider.<init>(SubsetDataProvider.java:29)atorg.apache.kato.hprof.datalayer.HProfFile$HPROFRecordProvider.getCurrentElement(HProfFile.java:487)atorg.apache.kato.hprof.datalayer.HProfFile$HPROFRecordProvider.getCurrentElement(HProfFile.java:437)atorg.apache.kato.common.BitMaskMappingArray.get(BitMaskMappingArray.java:56)atorg.apache.kato.hprof.datalayer.HProfFile.getRecord(HProfFile.java:784)... (my code) ... It looks like range of HeapDumpHProfRecord had value: -1293366370.Following is the part of the dump which I believes is responsible forthat (B2 E8 CB 9E):0055DD84 25 00 00 2A │ B7 AE 22 06 │ 80 00 00 00 │ 01 00 00 2A │ B7 AE20 0B │ 88 02 00 00 │ 00 00 00 00 │ 00 18 00 00 │ 23 26 00 000055DDA8 2A B7 B0 A1 │ 64 28 00 00 │ 00 01 00 00 │ 2A B7 B0 33 │ EA F80C 00 │ 00 00 00 B2 │ E8 CB 9E 20 │ 00 00 2A B7 │ B0 DE AF C80055DDCC 00 00 00 01 │ 00 00 2A B7 │ AE 22 0F 20 │ 00 00 2A AA │ BE 213D 28 │ 00 00 00 00 │ 00 00 00 00 │ 00 00 2A AA │ BE 25 B7 D8What is interesting hprof file format reserves only 4 bytes for lengthof a record, which allows to express record whose size is at most 4GB (assuming unsigned 4 bytes). My dump's version is: JAVA PROFILE 1.0.1 Regards Lukasz Lukasz wrote:Hi Stuart, Steve,I've taken deeper look into code. I still didn't trace carefully indexcalculation in classes BitMaskMappingArray and ArrayBitMaskMappingStrategy,but I managed to improve performance by increasing arrays size in thoseclasses (which is set in HProfFile class).If I understand code correctly, when capacity of BitMaskMappingArraywill be exhausted bucketSize is doubled, which in turn causes that more reads (even cached) is required to set position of IDataProvider/IArrayEntryProvider.Following are loading time results for default array size (1000) andincreased (1000000). Test ran against generated dump file (5000000 instances of Data). Default (1000): HeapSubRecord: 100000 (866ms, 4215kB) HeapSubRecord: 200000 (1716ms, 7879kB) HeapSubRecord: 300000 (2833ms, 11263kB) HeapSubRecord: 400000 (3889ms, 14283kB) HeapSubRecord: 500000 (3893ms, 17319kB)HeapSubRecord: 600000 (7248ms, 20479kB) (here probably buckedSize wasdoubled) HeapSubRecord: 700000 (7721ms, 23531kB) HeapSubRecord: 800000 (7729ms, 26567kB) HeapSubRecord: 900000 (7731ms, 29671kB) HeapSubRecord: 1000000 (7704ms, 32731kB) ... (I didn't wait until end) Increased(1000000): HeapSubRecord: 100000 (622ms, 17809kB) HeapSubRecord: 200000 (309ms, 20345kB) HeapSubRecord: 300000 (283ms, 23861kB) HeapSubRecord: 400000 (274ms, 27921kB) HeapSubRecord: 500000 (269ms, 29957kB) HeapSubRecord: 600000 (264ms, 31993kB) HeapSubRecord: 700000 (272ms, 36097kB) HeapSubRecord: 800000 (288ms, 37739kB) HeapSubRecord: 900000 (263ms, 39835kB) HeapSubRecord: 1000000 (259ms, 41931kB) HeapSubRecord: 1100000 (300ms, 44773kB) HeapSubRecord: 1200000 (283ms, 46901kB) HeapSubRecord: 1300000 (291ms, 49029kB) HeapSubRecord: 1400000 (328ms, 53801kB) HeapSubRecord: 1500000 (259ms, 53801kB) HeapSubRecord: 1600000 (272ms, 58125kB) HeapSubRecord: 1700000 (264ms, 60293kB) HeapSubRecord: 1800000 (264ms, 62473kB) HeapSubRecord: 1900000 (361ms, 61373kB) HeapSubRecord: 2000000 (274ms, 63105kB) ... HeapSubRecord: 9000000 (284ms, 231969kB) HeapSubRecord: 9100000 (272ms, 233597kB) HeapSubRecord: 9200000 (281ms, 236357kB) HeapSubRecord: 9300000 (274ms, 240469kB) HeapSubRecord: 9400000 (279ms, 244541kB) HeapSubRecord: 9500000 (269ms, 246549kB) HeapSubRecord: 9600000 (279ms, 250565kB) HeapSubRecord: 9700000 (265ms, 252573kB) HeapSubRecord: 9800000 (279ms, 256629kB) HeapSubRecord: 9900000 (265ms, 258669kB) HeapSubRecord: 10000000 (463ms, 263997kB) (end)i.e. my 60GB dump file contains more than 1 100 000 000 of objects (ifI remember correctly). Regards Lukasz Stuart Monteith wrote:The hprof dump reader spends a lot of time reading the whole file, forvarious reason.The indices it has in memory are constructed through an initial read,and this is alsothe source of the memory usage. In addition, there is some correlationto be done which also takes up time, and induces yet more reading.I'm sure some work could be done to improve the performance further,but we'll have to look at the tradeoff between diskspace and memory usage. The hprof file format itselfis what it is, however, and we have no influence over that. The CJVMTIagent is has lots ofroom for improvement, but I suspect its potential for improvement isunlikely to be much better than existing hprof implementations. The built-in JVM hprof dumper will probably be a hard act to follow.The HProf implementation is not thread-safe. Realistically, I think itis something that ought tobe considered once things are more mature. There will be algorithmsthat can deal with the JVM structure sensible. And thanks Lukasz, it's great to have your input. Regards, Stuart--Stuart Monteith http://blog.stoo.me.uk/
-- Stuart Monteith http://blog.stoo.me.uk/