Hi Richard,
Thanks a lot for your response!
Another failure reported by the Linaro CI is as follows :
(Note: I am planning to send a separate mail for each failure, as this will make
the discussion easy to track)
FAIL: gcc.target/aarch64/sve/acle/general/cpy_1.c -march=armv8.2-a+sve
-moverride=tune=none check-function-bodies dup_x0_m
Expected code:
...
add (x[0-9]+), x0, #?1
mov (p[0-7])\.b, p15\.b
mov z0\.d, \2/m, \1
...
ret
Code obtained w/o patch:
addvl sp, sp, #-1
str p15, [sp]
add x0, x0, 1
mov p3.b, p15.b
mov z0.d, p3/m, x0
ldr p15, [sp]
addvl sp, sp, #1
ret
Code obtained w/ patch:
addvl sp, sp, #-1
str p15, [sp]
mov p3.b, p15.b
add x0, x0, 1
mov z0.d, p3/m, x0
ldr p15, [sp]
addvl sp, sp, #1
ret
As we can see, with the patch, the following two instructions are interchanged:
add x0, x0, 1
mov p3.b, p15.b
I believe that this is fine and the test can be modified to allow it to pass on
aarch64. Please let me know what you think.
Regards,
Surya
On 24/11/23 4:18 pm, Richard Earnshaw wrote:
>
>
> On 24/11/2023 08:09, Surya Kumari Jangala via Gcc wrote:
>> Hi Richard,
>> Ping. Please let me know if the test failure that I mentioned in the mail
>> below can be handled by changing the expected generated code. I am not
>> conversant with arm, and hence would appreciate your help.
>>
>> Regards,
>> Surya
>>
>> On 03/11/23 4:58 pm, Surya Kumari Jangala wrote:
>>> Hi Richard,
>>> I had submitted a patch for review
>>> (https://gcc.gnu.org/pipermail/gcc-patches/2023-October/631849.html)
>>> regarding scaling save/restore costs of callee save registers with block
>>> frequency in the IRA pass (PR111673).
>>>
>>> This patch has been approved by VMakarov
>>> (https://gcc.gnu.org/pipermail/gcc-patches/2023-October/632089.html).
>>>
>>> With this patch, we are seeing performance improvements with spec on x86
>>> (exchange: 5%, xalancbmk: 2.5%) and on Power (perlbench: 5.57%).
>>>
>>> I received a mail from Linaro about some failures seen in the CI pipeline
>>> with
>>> this patch. I have analyzed the failures and I wish to discuss the analysis
>>> with you.
>>>
>>> One failure reported by the Linaro CI is:
>>>
>>> FAIL: gcc.target/arm/pr111235.c scan-assembler-times ldrexd\tr[0-9]+,
>>> r[0-9]+, \\[r[0-9]+\\] 2
>>>
>>> The diff in the assembly between trunk and patch is:
>>>
>>> 93c93
>>> < push {r4, r5}
>>> ---
>>>> push {fp}
>>> 95c95
>>> < ldrexd r4, r5, [r0]
>>> ---
>>>> ldrexd fp, ip, [r0]
>>> 99c99
>>> < pop {r4, r5}
>>> ---
>>>> ldr fp, [sp], #4
>>>
>>>
>>> The test fails with patch because the ldrexd insn uses fp & ip registers
>>> instead
>>> of r[0-9]+
>>>
>>> But the code produced by patch is better because it is pushing and
>>> restoring only
>>> one register (fp) instead of two registers (r4, r5). Hence, this test can be
>>> modified to allow it to pass on arm. Please let me know what you think.
>>>
>>> If you need more information, please let me know. I will be sending
>>> separate mails
>>> for the other test failures.
>>>
>
> Thanks for looking at this.
>
>
> The key part of this test is that the compiler generates LDREXD. The
> registers used for that are pretty much irrelevant as we don't match them to
> any other operations within the test. So I'd recommend just testing for the
> mnemonic and not for any of the operands (ie just match "ldrexd\t").
>
> R.
>
>>> Regards,
>>> Surya
>>>
>>>
>>>