Re: [classlib][pack200][performance] Profiling unpacking scenario

Aleksey Shipilev Thu, 10 Jul 2008 09:00:42 -0700

I had quickly drafted the internal Java profiler for pack200 at [1].
Here are the results of profiling for 50Mb Eclipse JDT jar, times are
microsecs, identation emulates the call tree. Some of the label are
not distinguishable, but you may look up probe positions in the patch.


Unpack: 38311
 segment unpack: 38217
   parse segment: 11575
     parse header:  0
     parse ADB:     78
     parse bcbands: 5342
       parse1:      453
       parse2:      93
       select:      252
       attrlayout:  0
       methods:     3997
     parse cbands:  3358
       classattr:   1002
       code:        1636
       fields:      173
       methods:     515
     parse cpbands: 2483
     parse fbands:  63
     parse icbands: 16
   write jar: 26642
     build classf:  21111
       sfattrs:     47
       cfattrs:     0
       fields:      218
       interfaces:  0
       methods:     362
       addNested:   8146
       inner:       3051
       final:       8774
     write classf:  1934
       constpool:   1015
       interfaces:  0
       attributes:  31
       methods:     827
       fields:      31
     write primit:  486

That's the point where one can take the method and optimize it locally :)

Thanks,
Aleksey.

[1] https://issues.apache.org/jira/browse/HARMONY-5905

On Wed, Jul 9, 2008 at 9:20 PM, Aleksey Shipilev
<[EMAIL PROTECTED]> wrote:
> I had disabled the compression in my test to throw away ZIP overhead
> and focus on pack200 performance only. Thus the performance data is
> not relevant to previous measurements. The data are assumed with
> HARMONY-5900 incorporated.
>
> Harmony's pack200: 43 secs (3.5 Mb/secs)
> Sun's pack200: 9 secs (16.6 Mb/secs)
>
> Profile:
>
> 22.0% java.util.HashMap.*
> 11.4% java.io.FileInputStream.readBytes()
>  7.5% o.a.h.unpack200.bytecode.ClassConstantPool.addNested()
>  6.9% java.util.zip.*
>  5.6% o.a.h.pack200.BHSDCodec.decode()
>  4.8% java.lang.*
>  4.4% o.a.h.unpack200.IcBands.getRelevantIcTuples()
>  3.9% o.a.h.unpack200.bytecode.forms.NoArgumentForm.setByteCodeOperands()
>  3.2% o.a.h.unpack200.bytecode.ClassConstantPool.* (other)
>  3.0% o.a.h.unpack200.bytecode.CodeAttribute.*
>  2.8% java.io.FileOutputStream.writeBytes()
>  2.8% o.a.h.unpack200.bytecode.ByteCode.*
>  2.75% java.util.TreeMap.*
>
> Note ArrayList is gone!
> It seems like BHSDCodec.decode(), IcBands.getRelevanticTuples() and
> NoArgumentForm.setByteCodeOperands() are next candidates for tuning.
> After that, the performance improvement is not possible without deep
> changes, like overall algorithmic improvements. Anyway, that should be
> first, but I'm not familiar with the code yet. This can't stop us
> though ;)
>
> Thanks,
> Aleksey.
>
> On Wed, Jul 9, 2008 at 7:27 PM, Sian January <[EMAIL PROTECTED]> wrote:
>> Thanks for doing that Aleksey.  In fact I think Sun's was 20 or 30 times
>> faster before we started doing any performance optimizations, but it looks
>> like there's still some ground that we could make up!
>>
>>
>>
>> On 08/07/2008, Aleksey Shipilev <[EMAIL PROTECTED]> wrote:
>>>
>>> Hi,
>>>
>>> I took the liberty of profiling of pack200 implementation on unpacking
>>> scenario. Source data was obtained from Eclipse JDT jars, repacked in
>>> single 60 Mb jar file, then packed with pack200 from Sun's JDK (-E9
>>> used), resulting in 20 Mb pack200-compressed file. Then Sun JDK
>>> 1.6.0_05 (Windows, -server) was used together with hprof (cpu=time) to
>>> obtain the profile. My patch from HARMONY-5900 is onboard. The head of
>>> the profile looks like this:
>>>
>>> 4.76% org.apache.harmony.unpack200.bytecode.ClassConstantPool.addNested
>>> 4.22% java.util.HashMap.getEntry
>>> 2.99% java.util.AbstractList$Itr.next
>>> 2.92% java.util.AbstractList$Itr.hasNext
>>> 2.84% java.util.ArrayList.get
>>> 2.43% java.util.AbstractList$Itr.next
>>> 2.41% java.util.HashMap.containsKey
>>> 2.15% org.apache.harmony.unpack200.IcBands.getRelevantIcTuples
>>> 2.00% java.util.HashSet.contains
>>> 1.57% java.io.DataOutputStream.writeUTF
>>>
>>> Composite occupancy:
>>>
>>> 18.4% java.util.AbstractList
>>> 18.0% java.util.HashMap
>>> 15.8% java.util.ArrayList
>>> 10.5% o.a.h.unpack200.bytecode.ClassConstantPool.*
>>> 5.3%  o.a.h.unpack200.bytecode.CPUTF8.* (hashcode mostly)
>>> 4.5% java.io.*
>>> 4.5% java.lang.String.*
>>> 4.4% o.a.h.unpack200.bytecode.ByteCode.*
>>> 3.9%  o.a.h.unpack200.bytecode.Ic{Tuple|Bands}.*
>>> 14.7% other
>>>
>>> So the main concern is Collections usage. ClassConstantPool uses Lists
>>> excessively, so I suspect the significant amount of time is spent
>>> there.
>>>
>>> NB:
>>> Timings for the scenario (the less the better):
>>> Harmony's pack200: 67 secs
>>> Sun's pack200: 6 secs
>>>
>>> Yep, 10 times faster.
>>>
>>> Thanks,
>>> Aleksey.
>>>
>>
>>
>>
>> --
>> Unless stated otherwise above:
>> IBM United Kingdom Limited - Registered in England and Wales with number
>> 741598.
>> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>>
>

Re: [classlib][pack200][performance] Profiling unpacking scenario

Reply via email to