Re: [RFC][ARM]: Fix reload spill failure (PR 60617)

Venkataramanan Kumar Mon, 07 Jul 2014 04:49:20 -0700

Hi Ramana/Maxim,


On 18 June 2014 16:05, Venkataramanan Kumar
<venkataramanan.ku...@linaro.org> wrote:
> Hi Ramana,
>
> On 18 June 2014 15:29, Ramana Radhakrishnan <ramana....@googlemail.com> wrote:
>> On Mon, Jun 16, 2014 at 1:53 PM, Venkataramanan Kumar
>> <venkataramanan.ku...@linaro.org> wrote:
>>> Hi Maintainers,
>>>
>>> This patch fixes the PR 60617 that occurs when we turn on reload pass
>>> in thumb2 mode.
>>>
>>> It occurs for the pattern "*ior_scc_scc" that gets generated for the 3
>>> argument of the below function call.
>>>
>>> JIT:emitStoreInt32(dst,regT0m, (op1 == dst || op2 == dst)));
>>>
>>>
>>> (----snip---)
>>> (insn 634 633 635 27 (parallel [
>>>             (set (reg:SI 3 r3)
>>>                 (ior:SI (eq:SI (reg/v:SI 110 [ dst ]) <== This operand
>>> r5 is registers gets assigned
>>>                         (reg/v:SI 112 [ op2 ]))
>>>                     (eq:SI (reg/v:SI 110 [ dst ]) <== This operand
>>>                         (reg/v:SI 111 [ op1 ]))))
>>>             (clobber (reg:CC 100 cc))
>>>         ]) ../Source/JavaScriptCore/jit/JITArithmetic32_64.cpp:179 300
>>> {*ior_scc_scc
>>> (----snip---)
>>>
>>> The issue here is that the above pattern demands 5 registers (LO_REGS).
>>>
>>> But when we are in reload, registers r0 is used for pointer to the
>>> class, r1 and r2 for first and second argument. r7 is used for stack
>>> pointer.
>>>
>>> So we are left with r3,r4,r5 and r6. But the above patterns needs five
>>> LO_REGS. Hence we get spill failure when processing the last register
>>> operand in that pattern,
>>>
>>> In ARM port,  TARGET_LIKELY_SPILLED_CLASS is defined for Thumb-1 and
>>> for thumb 2 mode there is mention of using LO_REG in the comment as
>>> below.
>>>
>>> "Care should be taken to avoid adding thumb-2 patterns that require
>>> many low registers"
>>>
>>> So conservative fix is not to allow this pattern for Thumb-2 mode.
>>
>> I don't have an additional solution off the top of my head and
>> probably need to go do some digging.
>>
>> It sounds like the conservative fix but what's the impact of doing so
>> ? Have you measured that in terms of performance or code size on a
>> range of benchmarks ?
>>
>>>
>
> I haven't done any benchmark testing. I will try and run some
> benchmarks with my patch.
>
>
>>> I allowed these pattern for Thumb2 when we have constant operands for
>>> comparison. That makes the target tests arm/thum2-cond-cmp-1.c to
>>> thum2-cond-cmp-4.c pass.
>>
>> That sounds fine and fair - no trouble there.
>>
>> My concern is with removing the register alternatives and loosing the
>> ability to trigger conditional compares on 4.9 and trunk for Thumb1
>> till the time the "new" conditional compare work makes it in.
>>
>> Ramana

I tested this conservative fix with Coremark (ran it on chromebook)and
SPEC cpu2006 (cross compiled and compared assembly differences).

With Coremark there are no performance issues. In fact there no
assembly differences with CPU flags for A15 and A9.

For SPEC2006 I cross compiled and compared assembly differences with
and without patch (-O3 -fno-common).
I have not run these bechmarks.

There are major code differences and are due to conditional compare
instructions not getting generated as you expected. This also results
in different physical register numbers assigned in the generated code
and also there are code scheduling differences when comparing it with
base.


I am showing a simple test case to showcase the conditional compare
difference I am seeing in SPEC2006 benchmarks.

char a,b;
int i;
int f( int j)
{
  if ( (i >= a) ? (j <= a) : 1)
    return 1;
  else
    return 0;
}

GCC FSF 4.9
-----------

        movw    r2, #:lower16:a
        movw    r3, #:lower16:i
        movt    r2, #:upper16:a
        movt    r3, #:upper16:i
        ldrb    r2, [r2]        @ zero_extendqisi2
        ldr     r3, [r3]
        cmp     r2, r3
        it      le
        cmple   r2, r0          <== conditional compare instrucion generated.
        ite     ge
        movge   r0, #1
        movlt   r0, #0
        bx      lr


With patch
---------

        movw    r2, #:lower16:a
        movw    r3, #:lower16:i
        movt    r2, #:upper16:a
        movt    r3, #:upper16:i
        ldr     r3, [r3]
        ldrb    r2, [r2]        @ zero_extendqisi2
        cmp     r2, r3
        ite     le
        movle   r3, #0 <== Unoptimal moves generated.
        movgt   r3, #1 <==
        cmp     r2, r0
        ite     lt
        movlt   r0, r3
        orrge   r0, r3, #1<==
        bx      lr

The following benchmarks have maximum number of conditional compare
pattern differences and also code scheduling changes/different
physical register numbers in generated code.
416.gamess/
434.zeusmp/
400.perlbench
403.gcc/
445.gobmk
483.xalancbmk/
401.bzip2
433.milc/
435.gromacs

Assembly files differences seen in the below benchmarks buy are not very large.

436.cactusADM/
447.dealII
454.calculix
456.hmmer
453.povray/
465.tonto/
471.omnetpp
459.GemsFDTD
437.leslie3d
464.h264ref


Based on the results, I am of the opinion that this patch would cause
some performance hit if we have it up streamed on trunk or GCC 4.9.

Another thing to be noted is that this bug occurs when only when we
compile with -fno-tree-dce and -mno-lra on GCC 4.9.

Also with current FSF trunk and GCC 4.9,  we have LRA as default for
ARM, which does not have this bug.

My opinion is that reporter can use this fix to build the package that
caused it and if it has to be built with -fno-tree-dce.

I am not sure if we can do anything in machine descriptions level to
prevent this bug. Also in reload pass I am not sure if it is easy to
fix.

Do you have any suggestions?

regards,
Venkat.

>
> This bug does not occur when LRA is enabled. In 4.9 FSF and trunk, the
>  LRA pass is enabled by default now .
> May be too conservative, but is there a way to enable this pattern
> when we have LRA pass and prevent it we have old reload pass?
>
> regards,
> Venkat.
>
>
>
>>
>>>
>>> Regression tested with gcc 4.9 branch since in trunk this bug is
>>> masked revision 209897.
>>>
>>> Please provide your suggestion on this patch
>>>
>>> regards,
>>> Venkat.

Re: [RFC][ARM]: Fix reload spill failure (PR 60617)

Reply via email to