Sian, Andrew,
An update here. I had updated the profiler [1] and run it over on JDT
unpacking scenario several times. That's what we have today (times are
msecs, indentation resembles call hierarchy):
Unpack: 220529
segment read: 28853
cpBands: 13791
adBands: 214
icBands: 343
cbBands: 9158
bcBands: 4694
fbBands: 118
fBits: 407
segment parse: 176929
header: 0
cpBands: 0
adBands: 0
icBands: 1
cbBands: 0
bcBands: 77245
exceptn: 656
newCA: 71145
getBC: 16372
extOpnd: 26776
fixup: 1885
methAttr: 591
curAttr: 1808
fbBands: 0
buildCF: 82057
ccp.addN: 36182
ccp.addNW: 433
ccp.resv: 27452
ic.getIC: 11717
cfWrite: 17379
segment write: 14349
As you can see in commits, I had filed a couple of JIRAs with the
bunch of pack200 optimizations [2,3,4,5], here what I got with all
them applied:
Unpack: 193165 (-14% in total)
segment read: 28334
cpBands: 13034
adBands: 249
icBands: 341
cbBands: 9299
bcBands: 4645
fbBands: 169
fBits: 459
segment parse: 150032
header: 1
cpBands: 0
adBands: 0
icBands: 82
cbBands: 0
bcBands: 73936
exceptn: 633
newCA: 67871
getBC: 13874 <--- (-18% due to [2])
extOpnd: 26583
fixup: 1808
methAttr: 615
curAttr: 1663
fbBands: 0
buildCF: 58319
ccp.addN: 26199 <---- (-38% due to [5])
ccp.addNW: 424
ccp.resv: 23642 <---- (-16% due to [5])
ic.getIC: 2245 <--- (-80% due to [3,4])
cfWrite: 17463
segment write: 14413
Of course, the boosts are diminished with the performance overheads of
profiling. But still, this profile gives pretty good insight on what's
going on. CodeAttribute ["newCA" is the "new CodeAttribute(...)"] is
the next candidate for optimization, I guess.
Sian, Andrew, can you please review the patches? I'm particularly
interested in [5], because it's proof-of-concept and kind of
controversial.
Thanks,
Aleksey.
[1] "classlib][pack200] Internal profiler for pack200"
https://issues.apache.org/jira/browse/HARMONY-5905
[2] [classlib][pack200][performance] java.util.HashMap usage optimization
https://issues.apache.org/jira/browse/HARMONY-5928
[3] [classlib][pack200][performance] Segment.computeIcStored rewrite
https://issues.apache.org/jira/browse/HARMONY-5929
[4] [classlib][pack200][performance] IcBands.getRelevantIcTuples rewrite
https://issues.apache.org/jira/browse/HARMONY-5930
[5] [classlib][pack200][performance] Some ClassConstantPool content
may not be needed
https://issues.apache.org/jira/browse/HARMONY-5931
On Thu, Jul 10, 2008 at 7:59 PM, Aleksey Shipilev
<[EMAIL PROTECTED]> wrote:
> I had quickly drafted the internal Java profiler for pack200 at [1].
> Here are the results of profiling for 50Mb Eclipse JDT jar, times are
> microsecs, identation emulates the call tree. Some of the label are
> not distinguishable, but you may look up probe positions in the patch.
>
> Unpack: 38311
> segment unpack: 38217
> parse segment: 11575
> parse header: 0
> parse ADB: 78
> parse bcbands: 5342
> parse1: 453
> parse2: 93
> select: 252
> attrlayout: 0
> methods: 3997
> parse cbands: 3358
> classattr: 1002
> code: 1636
> fields: 173
> methods: 515
> parse cpbands: 2483
> parse fbands: 63
> parse icbands: 16
> write jar: 26642
> build classf: 21111
> sfattrs: 47
> cfattrs: 0
> fields: 218
> interfaces: 0
> methods: 362
> addNested: 8146
> inner: 3051
> final: 8774
> write classf: 1934
> constpool: 1015
> interfaces: 0
> attributes: 31
> methods: 827
> fields: 31
> write primit: 486
>
> That's the point where one can take the method and optimize it locally :)
>
> Thanks,
> Aleksey.
>
> [1] https://issues.apache.org/jira/browse/HARMONY-5905
>
> On Wed, Jul 9, 2008 at 9:20 PM, Aleksey Shipilev
> <[EMAIL PROTECTED]> wrote:
>> I had disabled the compression in my test to throw away ZIP overhead
>> and focus on pack200 performance only. Thus the performance data is
>> not relevant to previous measurements. The data are assumed with
>> HARMONY-5900 incorporated.
>>
>> Harmony's pack200: 43 secs (3.5 Mb/secs)
>> Sun's pack200: 9 secs (16.6 Mb/secs)
>>
>> Profile:
>>
>> 22.0% java.util.HashMap.*
>> 11.4% java.io.FileInputStream.readBytes()
>> 7.5% o.a.h.unpack200.bytecode.ClassConstantPool.addNested()
>> 6.9% java.util.zip.*
>> 5.6% o.a.h.pack200.BHSDCodec.decode()
>> 4.8% java.lang.*
>> 4.4% o.a.h.unpack200.IcBands.getRelevantIcTuples()
>> 3.9% o.a.h.unpack200.bytecode.forms.NoArgumentForm.setByteCodeOperands()
>> 3.2% o.a.h.unpack200.bytecode.ClassConstantPool.* (other)
>> 3.0% o.a.h.unpack200.bytecode.CodeAttribute.*
>> 2.8% java.io.FileOutputStream.writeBytes()
>> 2.8% o.a.h.unpack200.bytecode.ByteCode.*
>> 2.75% java.util.TreeMap.*
>>
>> Note ArrayList is gone!
>> It seems like BHSDCodec.decode(), IcBands.getRelevanticTuples() and
>> NoArgumentForm.setByteCodeOperands() are next candidates for tuning.
>> After that, the performance improvement is not possible without deep
>> changes, like overall algorithmic improvements. Anyway, that should be
>> first, but I'm not familiar with the code yet. This can't stop us
>> though ;)
>>
>> Thanks,
>> Aleksey.
>>
>> On Wed, Jul 9, 2008 at 7:27 PM, Sian January <[EMAIL PROTECTED]> wrote:
>>> Thanks for doing that Aleksey. In fact I think Sun's was 20 or 30 times
>>> faster before we started doing any performance optimizations, but it looks
>>> like there's still some ground that we could make up!
>>>
>>>
>>>
>>> On 08/07/2008, Aleksey Shipilev <[EMAIL PROTECTED]> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I took the liberty of profiling of pack200 implementation on unpacking
>>>> scenario. Source data was obtained from Eclipse JDT jars, repacked in
>>>> single 60 Mb jar file, then packed with pack200 from Sun's JDK (-E9
>>>> used), resulting in 20 Mb pack200-compressed file. Then Sun JDK
>>>> 1.6.0_05 (Windows, -server) was used together with hprof (cpu=time) to
>>>> obtain the profile. My patch from HARMONY-5900 is onboard. The head of
>>>> the profile looks like this:
>>>>
>>>> 4.76% org.apache.harmony.unpack200.bytecode.ClassConstantPool.addNested
>>>> 4.22% java.util.HashMap.getEntry
>>>> 2.99% java.util.AbstractList$Itr.next
>>>> 2.92% java.util.AbstractList$Itr.hasNext
>>>> 2.84% java.util.ArrayList.get
>>>> 2.43% java.util.AbstractList$Itr.next
>>>> 2.41% java.util.HashMap.containsKey
>>>> 2.15% org.apache.harmony.unpack200.IcBands.getRelevantIcTuples
>>>> 2.00% java.util.HashSet.contains
>>>> 1.57% java.io.DataOutputStream.writeUTF
>>>>
>>>> Composite occupancy:
>>>>
>>>> 18.4% java.util.AbstractList
>>>> 18.0% java.util.HashMap
>>>> 15.8% java.util.ArrayList
>>>> 10.5% o.a.h.unpack200.bytecode.ClassConstantPool.*
>>>> 5.3% o.a.h.unpack200.bytecode.CPUTF8.* (hashcode mostly)
>>>> 4.5% java.io.*
>>>> 4.5% java.lang.String.*
>>>> 4.4% o.a.h.unpack200.bytecode.ByteCode.*
>>>> 3.9% o.a.h.unpack200.bytecode.Ic{Tuple|Bands}.*
>>>> 14.7% other
>>>>
>>>> So the main concern is Collections usage. ClassConstantPool uses Lists
>>>> excessively, so I suspect the significant amount of time is spent
>>>> there.
>>>>
>>>> NB:
>>>> Timings for the scenario (the less the better):
>>>> Harmony's pack200: 67 secs
>>>> Sun's pack200: 6 secs
>>>>
>>>> Yep, 10 times faster.
>>>>
>>>> Thanks,
>>>> Aleksey.
>>>>
>>>
>>>
>>>
>>> --
>>> Unless stated otherwise above:
>>> IBM United Kingdom Limited - Registered in England and Wales with number
>>> 741598.
>>> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>>>
>>
>