Re: [jira] Updated: (HARMONY-5965) [drlvm][jit]generate Mnemonic_LEA LIR for Op_Shladd HIR in IA32

Xiao-Feng Li Wed, 17 Sep 2008 01:14:16 -0700

Xiaoming, Thanks for the explanation.

Thanks,
xiaofeng


On Wed, Sep 17, 2008 at 3:35 PM, xiaoming gu <[EMAIL PROTECTED]> wrote:
>  The 7.9% improvement comes from the complex function (shift left+add) and
> quick execution (1 cycle) of LEA with
> special hardware optimizations. In IA32, LEA is designed for computing
> address originally but not limited to that
> purpose. So we may use LEA LIR for shladd HIR for common arithmetic
> calculations.
>
> And in the available MUL strength reduction (multiplybyconstant.cpp), there
> is some part of code implying to use
> LEA LIR for shladd HIR. But in later HIR2LIR pass, shladd HIR is transformed
> to SAL and ADD LIRs which makes
> MUL strength reduction always with no improvement.
>
> Thanks. -Xiaoming
>
> On Wed, Sep 17, 2008 at 11:16 AM, Xiao-Feng Li <[EMAIL PROTECTED]>wrote:
>
>> On Wed, Sep 17, 2008 at 10:29 AM, Xiaoming Gu (JIRA) <[EMAIL PROTECTED]>
>> wrote:
>> >
>> >     [
>> https://issues.apache.org/jira/browse/HARMONY-5965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]
>> >
>> > Xiaoming Gu updated HARMONY-5965:
>> > ---------------------------------
>> >
>> >    Attachment: H5965-V1.patch
>> >
>> > With this patch, shladd HIR could generate LEA LIR when the data is I4
>> type and shift-left bit is 1/2/3.
>> >
>> > Note: A new MemOpndKind "MemOpndKind_LEA" is created because we just use
>> the memory operand in LEA LIR to do common arithmetic calculation not for
>> real memory address computation. If we still use MemOpndKind_Heap, there are
>> some failed verifications in debug version.
>> >
>> > Then I turned on MUL strength reduction and get the following improvement
>> with a synthetic example.
>> >
>> > hotspot of source code:
>> >    for(int i=0;i<times;i++) // times=2,000,000,000
>> >        result = result*multiplier; //multiplier=10, which is transformed
>> from x*10 to (((x<<2)+x)<<1)+0
>> >
>> > Following is the binary code generated for "result = result*multiplier;".
>> >
>> > CASE 1: No MUL strength reduction - using IMUL
>> > I868: MOV s47(EDI):I_32,v426(ESI):I_32 \l\
>> > I867: MOV t351(EBP):I_32,t46(10):I_32 \l\
>> > I22: (ID:s16(EFLGS):U_32) =IMUL s47(EDI):I_32,t351(EBP):I_32  bcOff: 42
>> \l\
>> > I866: MOV v527[v513(ESP)+t526(-28)]:I_32,s47(EDI):I_32  bcOff: 43 \l\
>> > I865: MOV v529[v513(ESP)+t528(-32)]:I_32,t48(1):I_32  bcOff: 45 \l\
>> > I25: EmptyPseudoInst  bcOff: 48 \l\
>> >
>> > CASE 2: MUL strength reduction - using SAL and ADD
>> > I884: MOV s47(EBP):I_32,v438(ESI):I_32 \l\
>> > I23: (ID:s16(EFLGS):U_32) =SAL s47(EBP):I_32,t46(2):U_8  bcOff: 42 \l\
>> > I883: MOV s54(EDI):I_32,v438(ESI):I_32 \l\
>> > I24: (ID:s16(EFLGS):U_32) =ADD s54(EDI):I_32,s47(EBP):I_32  bcOff: 42 \l\
>> > I116: (AD:s54(EDI):I_32) =CopyPseudoInst/MOV (AU:s54(EDI):I_32) \l\
>> > I26: (ID:s16(EFLGS):U_32) =SAL s54(EDI):I_32,t51(1):U_8  bcOff: 42 \l\
>> > I117: (AD:s54(EDI):I_32) =CopyPseudoInst/MOV (AU:s54(EDI):I_32) \l\
>> > I27: (ID:s16(EFLGS):U_32) =ADD s54(EDI):I_32,t50(0):I_32  bcOff: 42 \l\
>> > I882: MOV v539[v525(ESP)+t538(-28)]:I_32,s54(EDI):I_32  bcOff: 43 \l\
>> > I881: MOV v541[v525(ESP)+t540(-32)]:I_32,t55(1):I_32  bcOff: 45 \l\
>> > I30: EmptyPseudoInst  bcOff: 48 \l\
>> >
>> > CASE 3: MUL strength reduction - using LEA
>> > I22: LEA t48(EBP):I_32,t47[v436(ESI)+v436(ESI)*t46(4)]:I_32  bcOff: 42
>> \l\
>> > I868: (ID:s16(EFLGS):U_32) =XOR t361(EDI):I_32,t361(EDI):I_32 \l\
>> > I23: LEA t52(EDI):I_32,t51[t361(EDI)+t48(EBP)*t50(2)]:I_32  bcOff: 42 \l\
>> > I867: MOV v537[v523(ESP)+t536(-28)]:I_32,t52(EDI):I_32  bcOff: 43 \l\
>> > I866: MOV v539[v523(ESP)+t538(-32)]:I_32,t53(1):I_32  bcOff: 45 \l\
>> > I26: EmptyPseudoInst  bcOff: 48 \l\
>> >
>> >                               CASE1         CASE2           CASE3
>> > Time (msec)        6234             7688                5734
>>
>> Good job!  The improvement looks good. It is about 7.9%. Thanks.
>>
>> Thanks,
>> xiaofeng
>>
>> > I'm going to spend more time for H5901 to adjust MUL strength reduction.
>> >
>> >> [drlvm][jit]generate Mnemonic_LEA LIR for Op_Shladd HIR in IA32
>> >> ---------------------------------------------------------------
>> >>
>> >>                 Key: HARMONY-5965
>> >>                 URL: https://issues.apache.org/jira/browse/HARMONY-5965
>> >>             Project: Harmony
>> >>          Issue Type: Improvement
>> >>          Components: DRLVM
>> >>            Reporter: Xiaoming Gu
>> >>         Attachments: H5965-V1.patch
>> >>
>> >>
>> >> In IA32 there is a quick (1 cycle) LEA instruction for loading effective
>> address. The function of LEA is a combination of shift-left and addition.
>> For example LEA dst, src, 2, 4 does dst=src<<2+4. It's usually used but not
>> limited in element address calculation for array.
>> >> In current Ia32InstCodeSelector.cpp, the function for translating
>> Op_Shladd HIR generates shl and add. Since LEA has the same semantic, we
>> could deploy it to improve performance.
>> >
>> > --
>> > This message is automatically generated by JIRA.
>> > -
>> > You can reply to this email to add a comment to the issue online.
>> >
>> >
>>
>>
>>
>> --
>> http://xiao-feng.blogspot.com
>>
>



-- 
http://xiao-feng.blogspot.com

Re: [jira] Updated: (HARMONY-5965) [drlvm][jit]generate Mnemonic_LEA LIR for Op_Shladd HIR in IA32

Reply via email to