I had quickly drafted the internal Java profiler for pack200 at [1].
Here are the results of profiling for 50Mb Eclipse JDT jar, times are
microsecs, identation emulates the call tree. Some of the label are
not distinguishable, but you may look up probe positions in the patch.
Unpack: 38311
segment unpack: 38217
parse segment: 11575
parse header: 0
parse ADB: 78
parse bcbands: 5342
parse1: 453
parse2: 93
select: 252
attrlayout: 0
methods: 3997
parse cbands: 3358
classattr: 1002
code: 1636
fields: 173
methods: 515
parse cpbands: 2483
parse fbands: 63
parse icbands: 16
write jar: 26642
build classf: 21111
sfattrs: 47
cfattrs: 0
fields: 218
interfaces: 0
methods: 362
addNested: 8146
inner: 3051
final: 8774
write classf: 1934
constpool: 1015
interfaces: 0
attributes: 31
methods: 827
fields: 31
write primit: 486
That's the point where one can take the method and optimize it locally :)
Thanks,
Aleksey.
[1] https://issues.apache.org/jira/browse/HARMONY-5905
On Wed, Jul 9, 2008 at 9:20 PM, Aleksey Shipilev
<[EMAIL PROTECTED]> wrote:
> I had disabled the compression in my test to throw away ZIP overhead
> and focus on pack200 performance only. Thus the performance data is
> not relevant to previous measurements. The data are assumed with
> HARMONY-5900 incorporated.
>
> Harmony's pack200: 43 secs (3.5 Mb/secs)
> Sun's pack200: 9 secs (16.6 Mb/secs)
>
> Profile:
>
> 22.0% java.util.HashMap.*
> 11.4% java.io.FileInputStream.readBytes()
> 7.5% o.a.h.unpack200.bytecode.ClassConstantPool.addNested()
> 6.9% java.util.zip.*
> 5.6% o.a.h.pack200.BHSDCodec.decode()
> 4.8% java.lang.*
> 4.4% o.a.h.unpack200.IcBands.getRelevantIcTuples()
> 3.9% o.a.h.unpack200.bytecode.forms.NoArgumentForm.setByteCodeOperands()
> 3.2% o.a.h.unpack200.bytecode.ClassConstantPool.* (other)
> 3.0% o.a.h.unpack200.bytecode.CodeAttribute.*
> 2.8% java.io.FileOutputStream.writeBytes()
> 2.8% o.a.h.unpack200.bytecode.ByteCode.*
> 2.75% java.util.TreeMap.*
>
> Note ArrayList is gone!
> It seems like BHSDCodec.decode(), IcBands.getRelevanticTuples() and
> NoArgumentForm.setByteCodeOperands() are next candidates for tuning.
> After that, the performance improvement is not possible without deep
> changes, like overall algorithmic improvements. Anyway, that should be
> first, but I'm not familiar with the code yet. This can't stop us
> though ;)
>
> Thanks,
> Aleksey.
>
> On Wed, Jul 9, 2008 at 7:27 PM, Sian January <[EMAIL PROTECTED]> wrote:
>> Thanks for doing that Aleksey. In fact I think Sun's was 20 or 30 times
>> faster before we started doing any performance optimizations, but it looks
>> like there's still some ground that we could make up!
>>
>>
>>
>> On 08/07/2008, Aleksey Shipilev <[EMAIL PROTECTED]> wrote:
>>>
>>> Hi,
>>>
>>> I took the liberty of profiling of pack200 implementation on unpacking
>>> scenario. Source data was obtained from Eclipse JDT jars, repacked in
>>> single 60 Mb jar file, then packed with pack200 from Sun's JDK (-E9
>>> used), resulting in 20 Mb pack200-compressed file. Then Sun JDK
>>> 1.6.0_05 (Windows, -server) was used together with hprof (cpu=time) to
>>> obtain the profile. My patch from HARMONY-5900 is onboard. The head of
>>> the profile looks like this:
>>>
>>> 4.76% org.apache.harmony.unpack200.bytecode.ClassConstantPool.addNested
>>> 4.22% java.util.HashMap.getEntry
>>> 2.99% java.util.AbstractList$Itr.next
>>> 2.92% java.util.AbstractList$Itr.hasNext
>>> 2.84% java.util.ArrayList.get
>>> 2.43% java.util.AbstractList$Itr.next
>>> 2.41% java.util.HashMap.containsKey
>>> 2.15% org.apache.harmony.unpack200.IcBands.getRelevantIcTuples
>>> 2.00% java.util.HashSet.contains
>>> 1.57% java.io.DataOutputStream.writeUTF
>>>
>>> Composite occupancy:
>>>
>>> 18.4% java.util.AbstractList
>>> 18.0% java.util.HashMap
>>> 15.8% java.util.ArrayList
>>> 10.5% o.a.h.unpack200.bytecode.ClassConstantPool.*
>>> 5.3% o.a.h.unpack200.bytecode.CPUTF8.* (hashcode mostly)
>>> 4.5% java.io.*
>>> 4.5% java.lang.String.*
>>> 4.4% o.a.h.unpack200.bytecode.ByteCode.*
>>> 3.9% o.a.h.unpack200.bytecode.Ic{Tuple|Bands}.*
>>> 14.7% other
>>>
>>> So the main concern is Collections usage. ClassConstantPool uses Lists
>>> excessively, so I suspect the significant amount of time is spent
>>> there.
>>>
>>> NB:
>>> Timings for the scenario (the less the better):
>>> Harmony's pack200: 67 secs
>>> Sun's pack200: 6 secs
>>>
>>> Yep, 10 times faster.
>>>
>>> Thanks,
>>> Aleksey.
>>>
>>
>>
>>
>> --
>> Unless stated otherwise above:
>> IBM United Kingdom Limited - Registered in England and Wales with number
>> 741598.
>> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>>
>