Re: [PATCH][ARM] Cleanup DImode shifts
Hi Wilco, On 7/31/19 5:25 PM, Wilco Dijkstra wrote: ping Like the logical operations, expand all shifts early rather than only sometimes. The Neon shift expansions are never emitted (not even with -fneon-for-64bits), so they are not useful. So all the late expansions and Neon shift patterns can be removed, and shifts are more optimized as a result. Since some extend patterns use Neon DImode shifts, remove the Neon extend variants and related splits. A simple example (relying on [1]) generates the same efficient code after this patch with -mfpu=neon and -mfpu=vfp (previously just the fact of having Neon enabled resulted inefficient code for no reason). unsigned long long f(unsigned long long x, unsigned long long y) { return x & (y >> 33); } Before: strd r4, r5, [sp, #-8]! lsr r4, r3, #1 mov r5, #0 and r1, r1, r5 and r0, r0, r4 ldrd r4, r5, [sp] add sp, sp, #8 bx lr After: and r0, r0, r3, lsr #1 mov r1, #0 bx lr Bootstrap and regress OK on arm-none-linux-gnueabihf --with-cpu=cortex-a57 Seems to me we should deprecate -mneon-for-64bits for GCC 10 and look to remove (or make it a no-op at least) in future releases... Ok for trunk. Please keep an eye for regressions as with the other patches in this series. Thanks, Kyrill [1] https://gcc.gnu.org/ml/gcc-patches/2019-07/msg01301.html ChangeLog: 2019-07-19 Wilco Dijkstra * config/arm/iterators.md (qhs_extenddi_cstr): Update. (qhs_extenddi_cstr): Likewise. * config/arm/arm.md (ashldi3): Always expand early. (ashlsi3): Likewise. (ashrsi3): Likewise. (zero_extenddi2): Remove Neon variants. (extenddi2): Likewise. * config/arm/neon.md (ashldi3_neon_noclobber): Remove. (signed_shift_di3_neon): Likewise. (unsigned_shift_di3_neon): Likewise. (ashrdi3_neon_imm_noclobber): Likewise. (lshrdi3_neon_imm_noclobber): Likewise. (di3_neon): Likewise. (split extend): Remove DI extend split patterns. testsuite/ * gcc.target/arm/neon-extend-1.c: Remove test. * gcc.target/arm/neon-extend-2.c: Remove test. --- diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 0dba97a4ebeed0c2133936ca662f1c9e86ffc6ba..10ed70dac4384354c0a2453c5e51a29108c6c062 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -3601,44 +3601,14 @@ (define_insn "*satsi__shift" (define_expand "ashldi3" [(set (match_operand:DI 0 "s_register_operand") (ashift:DI (match_operand:DI 1 "s_register_operand") - (match_operand:SI 2 "general_operand")))] + (match_operand:SI 2 "reg_or_int_operand")))] "TARGET_32BIT" " - if (TARGET_NEON) - { - /* Delay the decision whether to use NEON or core-regs until - register allocation. */ - emit_insn (gen_ashldi3_neon (operands[0], operands[1], operands[2])); - DONE; - } - else - { - /* Only the NEON case can handle in-memory shift counts. */ - if (!reg_or_int_operand (operands[2], SImode)) - operands[2] = force_reg (SImode, operands[2]); - } - - if (!CONST_INT_P (operands[2]) && TARGET_REALLY_IWMMXT) - ; /* No special preparation statements; expand pattern as above. */ - else - { - rtx scratch1, scratch2; - - /* Ideally we should use iwmmxt here if we could know that operands[1] - ends up already living in an iwmmxt register. Otherwise it's - cheaper to have the alternate code being generated than moving - values to iwmmxt regs and back. */ - - /* Expand operation using core-registers. - 'FAIL' would achieve the same thing, but this is a bit smarter. */ - scratch1 = gen_reg_rtx (SImode); - scratch2 = gen_reg_rtx (SImode); - arm_emit_coreregs_64bit_shift (ASHIFT, operands[0], operands[1], - operands[2], scratch1, scratch2); - DONE; - } - " -) + arm_emit_coreregs_64bit_shift (ASHIFT, operands[0], operands[1], + operands[2], gen_reg_rtx (SImode), + gen_reg_rtx (SImode)); + DONE; +") (define_expand "ashlsi3" [(set (match_operand:SI 0 "s_register_operand") @@ -3661,35 +3631,11 @@ (define_expand "ashrdi3" (match_operand:SI 2 "reg_or_int_operand")))] "TARGET_32BIT" " - if (TARGET_NEON) - { - /* Delay the decision whether to use NEON or core-regs until - register allocation. */ - emit_insn (gen_ashrdi3_neon (operands[0], operands[1], operands[2])); - DONE; - } - - if (!CONST_INT_P (operands[2]) && TARGET_REALLY_IWMMXT) - ; /* No special preparation statements; expand pattern as above.
Re: [PATCH][ARM] Cleanup DImode shifts
ping Like the logical operations, expand all shifts early rather than only sometimes. The Neon shift expansions are never emitted (not even with -fneon-for-64bits), so they are not useful. So all the late expansions and Neon shift patterns can be removed, and shifts are more optimized as a result. Since some extend patterns use Neon DImode shifts, remove the Neon extend variants and related splits. A simple example (relying on [1]) generates the same efficient code after this patch with -mfpu=neon and -mfpu=vfp (previously just the fact of having Neon enabled resulted inefficient code for no reason). unsigned long long f(unsigned long long x, unsigned long long y) { return x & (y >> 33); } Before: strd r4, r5, [sp, #-8]! lsr r4, r3, #1 mov r5, #0 and r1, r1, r5 and r0, r0, r4 ldrd r4, r5, [sp] add sp, sp, #8 bx lr After: and r0, r0, r3, lsr #1 mov r1, #0 bx lr Bootstrap and regress OK on arm-none-linux-gnueabihf --with-cpu=cortex-a57 [1] https://gcc.gnu.org/ml/gcc-patches/2019-07/msg01301.html ChangeLog: 2019-07-19 Wilco Dijkstra * config/arm/iterators.md (qhs_extenddi_cstr): Update. (qhs_extenddi_cstr): Likewise. * config/arm/arm.md (ashldi3): Always expand early. (ashlsi3): Likewise. (ashrsi3): Likewise. (zero_extenddi2): Remove Neon variants. (extenddi2): Likewise. * config/arm/neon.md (ashldi3_neon_noclobber): Remove. (signed_shift_di3_neon): Likewise. (unsigned_shift_di3_neon): Likewise. (ashrdi3_neon_imm_noclobber): Likewise. (lshrdi3_neon_imm_noclobber): Likewise. (di3_neon): Likewise. (split extend): Remove DI extend split patterns. testsuite/ * gcc.target/arm/neon-extend-1.c: Remove test. * gcc.target/arm/neon-extend-2.c: Remove test. --- diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 0dba97a4ebeed0c2133936ca662f1c9e86ffc6ba..10ed70dac4384354c0a2453c5e51a29108c6c062 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -3601,44 +3601,14 @@ (define_insn "*satsi__shift" (define_expand "ashldi3" [(set (match_operand:DI 0 "s_register_operand") (ashift:DI (match_operand:DI 1 "s_register_operand") - (match_operand:SI 2 "general_operand")))] + (match_operand:SI 2 "reg_or_int_operand")))] "TARGET_32BIT" " - if (TARGET_NEON) - { - /* Delay the decision whether to use NEON or core-regs until - register allocation. */ - emit_insn (gen_ashldi3_neon (operands[0], operands[1], operands[2])); - DONE; - } - else - { - /* Only the NEON case can handle in-memory shift counts. */ - if (!reg_or_int_operand (operands[2], SImode)) - operands[2] = force_reg (SImode, operands[2]); - } - - if (!CONST_INT_P (operands[2]) && TARGET_REALLY_IWMMXT) - ; /* No special preparation statements; expand pattern as above. */ - else - { - rtx scratch1, scratch2; - - /* Ideally we should use iwmmxt here if we could know that operands[1] - ends up already living in an iwmmxt register. Otherwise it's - cheaper to have the alternate code being generated than moving - values to iwmmxt regs and back. */ - - /* Expand operation using core-registers. - 'FAIL' would achieve the same thing, but this is a bit smarter. */ - scratch1 = gen_reg_rtx (SImode); - scratch2 = gen_reg_rtx (SImode); - arm_emit_coreregs_64bit_shift (ASHIFT, operands[0], operands[1], - operands[2], scratch1, scratch2); - DONE; - } - " -) + arm_emit_coreregs_64bit_shift (ASHIFT, operands[0], operands[1], + operands[2], gen_reg_rtx (SImode), + gen_reg_rtx (SImode)); + DONE; +") (define_expand "ashlsi3" [(set (match_operand:SI 0 "s_register_operand") @@ -3661,35 +3631,11 @@ (define_expand "ashrdi3" (match_operand:SI 2 "reg_or_int_operand")))] "TARGET_32BIT" " - if (TARGET_NEON) - { - /* Delay the decision whether to use NEON or core-regs until - register allocation. */ - emit_insn (gen_ashrdi3_neon (operands[0], operands[1], operands[2])); - DONE; - } - - if (!CONST_INT_P (operands[2]) && TARGET_REALLY_IWMMXT) - ; /* No special preparation statements; expand pattern as above. */ - else - { - rtx scratch1, scratch2; - - /* Ideally we should use iwmmxt here if we could know that operands[1] - ends up
Re: [PATCH][ARM] Cleanup DImode shifts
ping Like the logical operations, expand all shifts early rather than only sometimes. The Neon shift expansions are never emitted (not even with -fneon-for-64bits), so they are not useful. So all the late expansions and Neon shift patterns can be removed, and shifts are more optimized as a result. Since some extend patterns use Neon DImode shifts, remove the Neon extend variants and related splits. A simple example (relying on [1]) generates the same efficient code after this patch with -mfpu=neon and -mfpu=vfp (previously just the fact of having Neon enabled resulted inefficient code for no reason). unsigned long long f(unsigned long long x, unsigned long long y) { return x & (y >> 33); } Before: strd r4, r5, [sp, #-8]! lsr r4, r3, #1 mov r5, #0 and r1, r1, r5 and r0, r0, r4 ldrd r4, r5, [sp] add sp, sp, #8 bx lr After: and r0, r0, r3, lsr #1 mov r1, #0 bx lr Bootstrap and regress OK on arm-none-linux-gnueabihf --with-cpu=cortex-a57 [1] https://gcc.gnu.org/ml/gcc-patches/2019-07/msg01301.html ChangeLog: 2019-07-19 Wilco Dijkstra * config/arm/iterators.md (qhs_extenddi_cstr): Update. (qhs_extenddi_cstr): Likewise. * config/arm/arm.md (ashldi3): Always expand early. (ashlsi3): Likewise. (ashrsi3): Likewise. (zero_extenddi2): Remove Neon variants. (extenddi2): Likewise. * config/arm/neon.md (ashldi3_neon_noclobber): Remove. (signed_shift_di3_neon): Likewise. (unsigned_shift_di3_neon): Likewise. (ashrdi3_neon_imm_noclobber): Likewise. (lshrdi3_neon_imm_noclobber): Likewise. (di3_neon): Likewise. (split extend): Remove DI extend split patterns. testsuite/ * gcc.target/arm/neon-extend-1.c: Remove test. * gcc.target/arm/neon-extend-2.c: Remove test. --- diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 0dba97a4ebeed0c2133936ca662f1c9e86ffc6ba..10ed70dac4384354c0a2453c5e51a29108c6c062 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -3601,44 +3601,14 @@ (define_insn "*satsi__shift" (define_expand "ashldi3" [(set (match_operand:DI 0 "s_register_operand") (ashift:DI (match_operand:DI 1 "s_register_operand") - (match_operand:SI 2 "general_operand")))] + (match_operand:SI 2 "reg_or_int_operand")))] "TARGET_32BIT" " - if (TARGET_NEON) - { - /* Delay the decision whether to use NEON or core-regs until - register allocation. */ - emit_insn (gen_ashldi3_neon (operands[0], operands[1], operands[2])); - DONE; - } - else - { - /* Only the NEON case can handle in-memory shift counts. */ - if (!reg_or_int_operand (operands[2], SImode)) - operands[2] = force_reg (SImode, operands[2]); - } - - if (!CONST_INT_P (operands[2]) && TARGET_REALLY_IWMMXT) - ; /* No special preparation statements; expand pattern as above. */ - else - { - rtx scratch1, scratch2; - - /* Ideally we should use iwmmxt here if we could know that operands[1] - ends up already living in an iwmmxt register. Otherwise it's - cheaper to have the alternate code being generated than moving - values to iwmmxt regs and back. */ - - /* Expand operation using core-registers. - 'FAIL' would achieve the same thing, but this is a bit smarter. */ - scratch1 = gen_reg_rtx (SImode); - scratch2 = gen_reg_rtx (SImode); - arm_emit_coreregs_64bit_shift (ASHIFT, operands[0], operands[1], - operands[2], scratch1, scratch2); - DONE; - } - " -) + arm_emit_coreregs_64bit_shift (ASHIFT, operands[0], operands[1], + operands[2], gen_reg_rtx (SImode), + gen_reg_rtx (SImode)); + DONE; +") (define_expand "ashlsi3" [(set (match_operand:SI 0 "s_register_operand") @@ -3661,35 +3631,11 @@ (define_expand "ashrdi3" (match_operand:SI 2 "reg_or_int_operand")))] "TARGET_32BIT" " - if (TARGET_NEON) - { - /* Delay the decision whether to use NEON or core-regs until - register allocation. */ - emit_insn (gen_ashrdi3_neon (operands[0], operands[1], operands[2])); - DONE; - } - - if (!CONST_INT_P (operands[2]) && TARGET_REALLY_IWMMXT) - ; /* No special preparation statements; expand pattern as above. */ - else - { - rtx scratch1, scratch2; - - /* Ideally we should use iwmmxt here if we could know that operands[1] - ends up already living in an iwmmxt register. Otherwise it's - cheaper to have the alternate code being generated than moving - values
Re: [PATCH][ARM] Cleanup DImode shifts
Hi Ramana, > Thanks for this patch set - What I'm missing in this is any analysis as > to what's the impact on code generation for neon intrinsics that use > uint64_t ? Especially things like v_u64 ? Well things like this continue to work exactly like before: uint64x1_t f20(uint64x1_t x, uint64x1_t y) { return vshl_u64 (x, y); } uint64x1_t f21(uint64x1_t x) { return vshl_n_u64 (x, 10); } f20: vmovd16, r0, r1 @ int vmovd17, r2, r3 @ int vshl.u64d16, d16, d17 vmovr0, r1, d16 @ int bx lr f21: vmovd16, r0, r1 @ int vshl.i64d16, d16, #10 vmovr0, r1, d16 @ int bx lr As you can see there is a problem with the uint64x1_t type which for a strange reason maps to DImode, so avoiding Neon here would avoid lots of moves... The vadd_u64 variant emits the right code already: uint64x1_t f22(uint64x1_t x, uint64x1_t y) { return vadd_u64 (x, y); } f22: addsr0, r0, r2 adc r1, r1, r3 bx lr Wilco
Re: [PATCH][ARM] Cleanup DImode shifts
On 22/07/2019 17:16, Wilco Dijkstra wrote: Like the logical operations, expand all shifts early rather than only sometimes. The Neon shift expansions are never emitted (not even with -fneon-for-64bits), so they are not useful. So all the late expansions and Neon shift patterns can be removed, and shifts are more optimized as a result. Since some extend patterns use Neon DImode shifts, remove the Neon extend variants and related splits. A simple example (relying on [1]) generates the same efficient code after this patch with -mfpu=neon and -mfpu=vfp (previously just the fact of having Neon enabled resulted inefficient code for no reason). unsigned long long f(unsigned long long x, unsigned long long y) { return x & (y >> 33); } Before: strdr4, r5, [sp, #-8]! lsr r4, r3, #1 mov r5, #0 and r1, r1, r5 and r0, r0, r4 ldrdr4, r5, [sp] add sp, sp, #8 bx lr After: and r0, r0, r3, lsr #1 mov r1, #0 bx lr Bootstrap and regress OK on arm-none-linux-gnueabihf --with-cpu=cortex-a57 [1] https://gcc.gnu.org/ml/gcc-patches/2019-07/msg01301.html Thanks for this patch set - What I'm missing in this is any analysis as to what's the impact on code generation for neon intrinsics that use uint64_t ? Especially things like v_u64 ? Ramana ChangeLog: 2019-07-19 Wilco Dijkstra * config/arm/iterators.md (qhs_extenddi_cstr): Update. (qhs_extenddi_cstr): Likewise. * config/arm/arm.md (ashldi3): Always expand early. (ashlsi3): Likewise. (ashrsi3): Likewise. (zero_extenddi2): Remove Neon variants. (extenddi2): Likewise. * config/arm/neon.md (ashldi3_neon_noclobber): Remove. (signed_shift_di3_neon): Likewise. (unsigned_shift_di3_neon): Likewise. (ashrdi3_neon_imm_noclobber): Likewise. (lshrdi3_neon_imm_noclobber): Likewise. (di3_neon): Likewise. (split extend): Remove DI extend split patterns. testsuite/ * gcc.target/arm/neon-extend-1.c: Remove test. * gcc.target/arm/neon-extend-2.c: Remove test. --- diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 0dba97a4ebeed0c2133936ca662f1c9e86ffc6ba..10ed70dac4384354c0a2453c5e51a29108c6c062 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -3601,44 +3601,14 @@ (define_insn "*satsi__shift" (define_expand "ashldi3" [(set (match_operand:DI0 "s_register_operand") (ashift:DI (match_operand:DI 1 "s_register_operand") - (match_operand:SI 2 "general_operand")))] + (match_operand:SI 2 "reg_or_int_operand")))] "TARGET_32BIT" " - if (TARGET_NEON) -{ - /* Delay the decision whether to use NEON or core-regs until -register allocation. */ - emit_insn (gen_ashldi3_neon (operands[0], operands[1], operands[2])); - DONE; -} - else -{ - /* Only the NEON case can handle in-memory shift counts. */ - if (!reg_or_int_operand (operands[2], SImode)) -operands[2] = force_reg (SImode, operands[2]); -} - - if (!CONST_INT_P (operands[2]) && TARGET_REALLY_IWMMXT) -; /* No special preparation statements; expand pattern as above. */ - else -{ - rtx scratch1, scratch2; - - /* Ideally we should use iwmmxt here if we could know that operands[1] - ends up already living in an iwmmxt register. Otherwise it's - cheaper to have the alternate code being generated than moving - values to iwmmxt regs and back. */ - - /* Expand operation using core-registers. -'FAIL' would achieve the same thing, but this is a bit smarter. */ - scratch1 = gen_reg_rtx (SImode); - scratch2 = gen_reg_rtx (SImode); - arm_emit_coreregs_64bit_shift (ASHIFT, operands[0], operands[1], -operands[2], scratch1, scratch2); - DONE; -} - " -) + arm_emit_coreregs_64bit_shift (ASHIFT, operands[0], operands[1], +operands[2], gen_reg_rtx (SImode), +gen_reg_rtx (SImode)); + DONE; +") (define_expand "ashlsi3" [(set (match_operand:SI0 "s_register_operand") @@ -3661,35 +3631,11 @@ (define_expand "ashrdi3" (match_operand:SI 2 "reg_or_int_operand")))] "TARGET_32BIT" " - if (TARGET_NEON) -{ - /* Delay the decision whether to use NEON or core-regs until -register allocation. */ - emit_insn (gen_ashrdi3_neon (operands[0], operands[1], operands[2])); - DONE; -} - - if (!CONST_INT_P (operands[2]) && TARGET_REALLY_IWMMXT) -; /* No special preparation statements; expand pattern as above. */ - else -{ - rtx scratch1, scratch2; - - /* Ideally we should use iwmmxt here if we could know that operands[1] - ends up already living in
[PATCH][ARM] Cleanup DImode shifts
Like the logical operations, expand all shifts early rather than only sometimes. The Neon shift expansions are never emitted (not even with -fneon-for-64bits), so they are not useful. So all the late expansions and Neon shift patterns can be removed, and shifts are more optimized as a result. Since some extend patterns use Neon DImode shifts, remove the Neon extend variants and related splits. A simple example (relying on [1]) generates the same efficient code after this patch with -mfpu=neon and -mfpu=vfp (previously just the fact of having Neon enabled resulted inefficient code for no reason). unsigned long long f(unsigned long long x, unsigned long long y) { return x & (y >> 33); } Before: strdr4, r5, [sp, #-8]! lsr r4, r3, #1 mov r5, #0 and r1, r1, r5 and r0, r0, r4 ldrdr4, r5, [sp] add sp, sp, #8 bx lr After: and r0, r0, r3, lsr #1 mov r1, #0 bx lr Bootstrap and regress OK on arm-none-linux-gnueabihf --with-cpu=cortex-a57 [1] https://gcc.gnu.org/ml/gcc-patches/2019-07/msg01301.html ChangeLog: 2019-07-19 Wilco Dijkstra * config/arm/iterators.md (qhs_extenddi_cstr): Update. (qhs_extenddi_cstr): Likewise. * config/arm/arm.md (ashldi3): Always expand early. (ashlsi3): Likewise. (ashrsi3): Likewise. (zero_extenddi2): Remove Neon variants. (extenddi2): Likewise. * config/arm/neon.md (ashldi3_neon_noclobber): Remove. (signed_shift_di3_neon): Likewise. (unsigned_shift_di3_neon): Likewise. (ashrdi3_neon_imm_noclobber): Likewise. (lshrdi3_neon_imm_noclobber): Likewise. (di3_neon): Likewise. (split extend): Remove DI extend split patterns. testsuite/ * gcc.target/arm/neon-extend-1.c: Remove test. * gcc.target/arm/neon-extend-2.c: Remove test. --- diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 0dba97a4ebeed0c2133936ca662f1c9e86ffc6ba..10ed70dac4384354c0a2453c5e51a29108c6c062 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -3601,44 +3601,14 @@ (define_insn "*satsi__shift" (define_expand "ashldi3" [(set (match_operand:DI0 "s_register_operand") (ashift:DI (match_operand:DI 1 "s_register_operand") - (match_operand:SI 2 "general_operand")))] + (match_operand:SI 2 "reg_or_int_operand")))] "TARGET_32BIT" " - if (TARGET_NEON) -{ - /* Delay the decision whether to use NEON or core-regs until -register allocation. */ - emit_insn (gen_ashldi3_neon (operands[0], operands[1], operands[2])); - DONE; -} - else -{ - /* Only the NEON case can handle in-memory shift counts. */ - if (!reg_or_int_operand (operands[2], SImode)) -operands[2] = force_reg (SImode, operands[2]); -} - - if (!CONST_INT_P (operands[2]) && TARGET_REALLY_IWMMXT) -; /* No special preparation statements; expand pattern as above. */ - else -{ - rtx scratch1, scratch2; - - /* Ideally we should use iwmmxt here if we could know that operands[1] - ends up already living in an iwmmxt register. Otherwise it's - cheaper to have the alternate code being generated than moving - values to iwmmxt regs and back. */ - - /* Expand operation using core-registers. -'FAIL' would achieve the same thing, but this is a bit smarter. */ - scratch1 = gen_reg_rtx (SImode); - scratch2 = gen_reg_rtx (SImode); - arm_emit_coreregs_64bit_shift (ASHIFT, operands[0], operands[1], -operands[2], scratch1, scratch2); - DONE; -} - " -) + arm_emit_coreregs_64bit_shift (ASHIFT, operands[0], operands[1], +operands[2], gen_reg_rtx (SImode), +gen_reg_rtx (SImode)); + DONE; +") (define_expand "ashlsi3" [(set (match_operand:SI0 "s_register_operand") @@ -3661,35 +3631,11 @@ (define_expand "ashrdi3" (match_operand:SI 2 "reg_or_int_operand")))] "TARGET_32BIT" " - if (TARGET_NEON) -{ - /* Delay the decision whether to use NEON or core-regs until -register allocation. */ - emit_insn (gen_ashrdi3_neon (operands[0], operands[1], operands[2])); - DONE; -} - - if (!CONST_INT_P (operands[2]) && TARGET_REALLY_IWMMXT) -; /* No special preparation statements; expand pattern as above. */ - else -{ - rtx scratch1, scratch2; - - /* Ideally we should use iwmmxt here if we could know that operands[1] - ends up already living in an iwmmxt register. Otherwise it's - cheaper to have the alternate code being generated than moving - values to iwmmxt regs and back. */ - - /* Expand operation using core-registers. -'FAIL' would achieve the same thing, but this is a bit