Thanks for making steady progress, Xiaoming! Have you composite scores for Stefan Krause's benchmark set with your patch included?
Thanks, Aleksey. On Fri, Sep 19, 2008 at 10:45 AM, xiaoming gu <[EMAIL PROTECTED]> wrote: > Hi, all. I did something more for shladd=>LEA today. With the available MUL > strength reduction, > X*10 is reduced to (X<<2+X) <<1+0 and 0 is generated by a self XOR > instruction (CASE 3). > Actually this XOR is not necessay and could be eliminated in HIR2LIR pass. > Following is the > better instructions generated with the improve patch. Comparing with > previous CASE 3, you may > find XOR gone. > > > CASE 4: MUL strength reduction - using LEA and taking care of 0 > > I22: LEA t48(EDI):I_32,t47[v434(EBP)+v434(EBP)*t46(4)]:I_32 bcOff: 42 \l\ > I23: LEA t52(EDI):I_32,t51[t48(EDI)*t50(2)+t49(0)]:I_32 bcOff: 42 \l\ > I861: MOV v533[v521(ESP)+t532(-24)]:I_32,t52(EDI):I_32 bcOff: 43 \l\ > I860: MOV v535[v521(ESP)+t534(-28)]:I_32,t53(1):I_32 bcOff: 45 \l\ > I26: EmptyPseudoInst bcOff: 48 \l\ > > CASE1 CASE2 CASE3 CASE4 > Time (msec) 6234 7688 5734 5704 > Normalized 1 1.233 0.920 0.915 > > > I'm going to submit the patch though it only brings small performance > improvement (0.5%). Any > comment is welcome. Thanks. > > Xiaoming > > > On Wed, Sep 17, 2008 at 4:13 PM, Xiao-Feng Li <[EMAIL PROTECTED]> wrote: > >> Xiaoming, Thanks for the explanation. >> >> Thanks, >> xiaofeng >> >> On Wed, Sep 17, 2008 at 3:35 PM, xiaoming gu <[EMAIL PROTECTED]> >> wrote: >> > The 7.9% improvement comes from the complex function (shift left+add) >> and >> > quick execution (1 cycle) of LEA with >> > special hardware optimizations. In IA32, LEA is designed for computing >> > address originally but not limited to that >> > purpose. So we may use LEA LIR for shladd HIR for common arithmetic >> > calculations. >> > >> > And in the available MUL strength reduction (multiplybyconstant.cpp), >> there >> > is some part of code implying to use >> > LEA LIR for shladd HIR. But in later HIR2LIR pass, shladd HIR is >> transformed >> > to SAL and ADD LIRs which makes >> > MUL strength reduction always with no improvement. >> > >> > Thanks. -Xiaoming >> > >> > On Wed, Sep 17, 2008 at 11:16 AM, Xiao-Feng Li <[EMAIL PROTECTED] >> >wrote: >> > >> >> On Wed, Sep 17, 2008 at 10:29 AM, Xiaoming Gu (JIRA) <[EMAIL PROTECTED]> >> >> wrote: >> >> > >> >> > [ >> >> >> https://issues.apache.org/jira/browse/HARMONY-5965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel >> ] >> >> > >> >> > Xiaoming Gu updated HARMONY-5965: >> >> > --------------------------------- >> >> > >> >> > Attachment: H5965-V1.patch >> >> > >> >> > With this patch, shladd HIR could generate LEA LIR when the data is I4 >> >> type and shift-left bit is 1/2/3. >> >> > >> >> > Note: A new MemOpndKind "MemOpndKind_LEA" is created because we just >> use >> >> the memory operand in LEA LIR to do common arithmetic calculation not >> for >> >> real memory address computation. If we still use MemOpndKind_Heap, there >> are >> >> some failed verifications in debug version. >> >> > >> >> > Then I turned on MUL strength reduction and get the following >> improvement >> >> with a synthetic example. >> >> > >> >> > hotspot of source code: >> >> > for(int i=0;i<times;i++) // times=2,000,000,000 >> >> > result = result*multiplier; //multiplier=10, which is >> transformed >> >> from x*10 to (((x<<2)+x)<<1)+0 >> >> > >> >> > Following is the binary code generated for "result = >> result*multiplier;". >> >> > >> >> > CASE 1: No MUL strength reduction - using IMUL >> >> > I868: MOV s47(EDI):I_32,v426(ESI):I_32 \l\ >> >> > I867: MOV t351(EBP):I_32,t46(10):I_32 \l\ >> >> > I22: (ID:s16(EFLGS):U_32) =IMUL s47(EDI):I_32,t351(EBP):I_32 bcOff: >> 42 >> >> \l\ >> >> > I866: MOV v527[v513(ESP)+t526(-28)]:I_32,s47(EDI):I_32 bcOff: 43 \l\ >> >> > I865: MOV v529[v513(ESP)+t528(-32)]:I_32,t48(1):I_32 bcOff: 45 \l\ >> >> > I25: EmptyPseudoInst bcOff: 48 \l\ >> >> > >> >> > CASE 2: MUL strength reduction - using SAL and ADD >> >> > I884: MOV s47(EBP):I_32,v438(ESI):I_32 \l\ >> >> > I23: (ID:s16(EFLGS):U_32) =SAL s47(EBP):I_32,t46(2):U_8 bcOff: 42 \l\ >> >> > I883: MOV s54(EDI):I_32,v438(ESI):I_32 \l\ >> >> > I24: (ID:s16(EFLGS):U_32) =ADD s54(EDI):I_32,s47(EBP):I_32 bcOff: 42 >> \l\ >> >> > I116: (AD:s54(EDI):I_32) =CopyPseudoInst/MOV (AU:s54(EDI):I_32) \l\ >> >> > I26: (ID:s16(EFLGS):U_32) =SAL s54(EDI):I_32,t51(1):U_8 bcOff: 42 \l\ >> >> > I117: (AD:s54(EDI):I_32) =CopyPseudoInst/MOV (AU:s54(EDI):I_32) \l\ >> >> > I27: (ID:s16(EFLGS):U_32) =ADD s54(EDI):I_32,t50(0):I_32 bcOff: 42 >> \l\ >> >> > I882: MOV v539[v525(ESP)+t538(-28)]:I_32,s54(EDI):I_32 bcOff: 43 \l\ >> >> > I881: MOV v541[v525(ESP)+t540(-32)]:I_32,t55(1):I_32 bcOff: 45 \l\ >> >> > I30: EmptyPseudoInst bcOff: 48 \l\ >> >> > >> >> > CASE 3: MUL strength reduction - using LEA >> >> > I22: LEA t48(EBP):I_32,t47[v436(ESI)+v436(ESI)*t46(4)]:I_32 bcOff: 42 >> >> \l\ >> >> > I868: (ID:s16(EFLGS):U_32) =XOR t361(EDI):I_32,t361(EDI):I_32 \l\ >> >> > I23: LEA t52(EDI):I_32,t51[t361(EDI)+t48(EBP)*t50(2)]:I_32 bcOff: 42 >> \l\ >> >> > I867: MOV v537[v523(ESP)+t536(-28)]:I_32,t52(EDI):I_32 bcOff: 43 \l\ >> >> > I866: MOV v539[v523(ESP)+t538(-32)]:I_32,t53(1):I_32 bcOff: 45 \l\ >> >> > I26: EmptyPseudoInst bcOff: 48 \l\ >> >> > >> >> > CASE1 CASE2 CASE3 >> >> > Time (msec) 6234 7688 5734 >> >> >> >> Good job! The improvement looks good. It is about 7.9%. Thanks. >> >> >> >> Thanks, >> >> xiaofeng >> >> >> >> > I'm going to spend more time for H5901 to adjust MUL strength >> reduction. >> >> > >> >> >> [drlvm][jit]generate Mnemonic_LEA LIR for Op_Shladd HIR in IA32 >> >> >> --------------------------------------------------------------- >> >> >> >> >> >> Key: HARMONY-5965 >> >> >> URL: >> https://issues.apache.org/jira/browse/HARMONY-5965 >> >> >> Project: Harmony >> >> >> Issue Type: Improvement >> >> >> Components: DRLVM >> >> >> Reporter: Xiaoming Gu >> >> >> Attachments: H5965-V1.patch >> >> >> >> >> >> >> >> >> In IA32 there is a quick (1 cycle) LEA instruction for loading >> effective >> >> address. The function of LEA is a combination of shift-left and >> addition. >> >> For example LEA dst, src, 2, 4 does dst=src<<2+4. It's usually used but >> not >> >> limited in element address calculation for array. >> >> >> In current Ia32InstCodeSelector.cpp, the function for translating >> >> Op_Shladd HIR generates shl and add. Since LEA has the same semantic, we >> >> could deploy it to improve performance. >> >> > >> >> > -- >> >> > This message is automatically generated by JIRA. >> >> > - >> >> > You can reply to this email to add a comment to the issue online. >> >> > >> >> > >> >> >> >> >> >> >> >> -- >> >> http://xiao-feng.blogspot.com >> >> >> > >> >> >> >> -- >> http://xiao-feng.blogspot.com >> >
