Re: [PATCH][ARM] Cleanup DImode shifts

2019-08-22 Thread Kyrill Tkachov

Hi Wilco,

On 7/31/19 5:25 PM, Wilco Dijkstra wrote:

ping


Like the logical operations, expand all shifts early rather than only
 sometimes.  The Neon shift expansions are never emitted (not even with
 -fneon-for-64bits), so they are not useful.  So all the late expansions
 and Neon shift patterns can be removed, and shifts are more optimized
 as a result.  Since some extend patterns use Neon DImode shifts, remove
 the Neon extend variants and related splits.

 A simple example (relying on [1]) generates the same efficient code after
 this patch with -mfpu=neon and -mfpu=vfp (previously just the fact of
 having Neon enabled resulted inefficient code for no reason).

 unsigned long long f(unsigned long long x, unsigned long long y)
 { return x & (y >> 33); }

 Before:
 strd    r4, r5, [sp, #-8]!
 lsr r4, r3, #1
 mov r5, #0
 and r1, r1, r5
 and r0, r0, r4
 ldrd    r4, r5, [sp]
 add sp, sp, #8
 bx  lr

 After:
 and r0, r0, r3, lsr #1
 mov r1, #0
 bx  lr

 Bootstrap and regress OK on arm-none-linux-gnueabihf 
--with-cpu=cortex-a57


Seems to me we should deprecate -mneon-for-64bits for GCC 10 and look to 
remove (or make it a no-op at least) in future releases...


Ok for trunk.

Please keep an eye for regressions as with the other patches in this series.

Thanks,

Kyrill



 [1] https://gcc.gnu.org/ml/gcc-patches/2019-07/msg01301.html

 ChangeLog:
 2019-07-19  Wilco Dijkstra  

 * config/arm/iterators.md (qhs_extenddi_cstr): Update.
 (qhs_extenddi_cstr): Likewise.
 * config/arm/arm.md (ashldi3): Always expand early.
 (ashlsi3): Likewise.
 (ashrsi3): Likewise.
 (zero_extenddi2): Remove Neon variants.
 (extenddi2): Likewise.
 * config/arm/neon.md (ashldi3_neon_noclobber): Remove.
 (signed_shift_di3_neon): Likewise.
 (unsigned_shift_di3_neon): Likewise.
 (ashrdi3_neon_imm_noclobber): Likewise.
 (lshrdi3_neon_imm_noclobber): Likewise.
 (di3_neon): Likewise.
 (split extend): Remove DI extend split patterns.

 testsuite/
 * gcc.target/arm/neon-extend-1.c: Remove test.
 * gcc.target/arm/neon-extend-2.c: Remove test.
 ---

 diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
 index 
0dba97a4ebeed0c2133936ca662f1c9e86ffc6ba..10ed70dac4384354c0a2453c5e51a29108c6c062 
100644

 --- a/gcc/config/arm/arm.md
 +++ b/gcc/config/arm/arm.md
 @@ -3601,44 +3601,14 @@ (define_insn "*satsi__shift"
  (define_expand "ashldi3"
    [(set (match_operand:DI    0 "s_register_operand")
  (ashift:DI (match_operand:DI 1 "s_register_operand")
 -   (match_operand:SI 2 "general_operand")))]
 +   (match_operand:SI 2 "reg_or_int_operand")))]
    "TARGET_32BIT"
    "
 -  if (TARGET_NEON)
 -    {
 -  /* Delay the decision whether to use NEON or core-regs until
 -    register allocation.  */
 -  emit_insn (gen_ashldi3_neon (operands[0], operands[1], 
operands[2]));

 -  DONE;
 -    }
 -  else
 -    {
 -  /* Only the NEON case can handle in-memory shift counts.  */
 -  if (!reg_or_int_operand (operands[2], SImode))
 -    operands[2] = force_reg (SImode, operands[2]);
 -    }
 -
 -  if (!CONST_INT_P (operands[2]) && TARGET_REALLY_IWMMXT)
 -    ; /* No special preparation statements; expand pattern as above.  */
 -  else
 -    {
 -  rtx scratch1, scratch2;
 -
 -  /* Ideally we should use iwmmxt here if we could know that 
operands[1]

 - ends up already living in an iwmmxt register. Otherwise it's
 - cheaper to have the alternate code being generated than moving
 - values to iwmmxt regs and back.  */
 -
 -  /* Expand operation using core-registers.
 -    'FAIL' would achieve the same thing, but this is a bit 
smarter.  */

 -  scratch1 = gen_reg_rtx (SImode);
 -  scratch2 = gen_reg_rtx (SImode);
 -  arm_emit_coreregs_64bit_shift (ASHIFT, operands[0], operands[1],
 -    operands[2], scratch1, scratch2);
 -  DONE;
 -    }
 -  "
 -)
 +  arm_emit_coreregs_64bit_shift (ASHIFT, operands[0], operands[1],
 +    operands[2], gen_reg_rtx (SImode),
 +    gen_reg_rtx (SImode));
 +  DONE;
 +")

  (define_expand "ashlsi3"
    [(set (match_operand:SI    0 "s_register_operand")
 @@ -3661,35 +3631,11 @@ (define_expand "ashrdi3"
   (match_operand:SI 2 "reg_or_int_operand")))]
    "TARGET_32BIT"
    "
 -  if (TARGET_NEON)
 -    {
 -  /* Delay the decision whether to use NEON or core-regs until
 -    register allocation.  */
 -  emit_insn (gen_ashrdi3_neon (operands[0], operands[1], 
operands[2]));

 -  DONE;
 -    }
 -
 -  if (!CONST_INT_P (operands[2]) && TARGET_REALLY_IWMMXT)
 -    ; /* No special preparation statements; expand pattern as above.  

Re: [PATCH][ARM] Cleanup DImode shifts

2019-08-19 Thread Wilco Dijkstra


   
 
ping
    
  
 Like the logical operations, expand all shifts early rather than only
  sometimes.  The Neon shift expansions are never emitted (not even with
  -fneon-for-64bits), so they are not useful.  So all the late expansions
  and Neon shift patterns can be removed, and shifts are more optimized
  as a result.  Since some extend patterns use Neon DImode shifts, remove
  the Neon extend variants and related splits.
  
  A simple example (relying on [1]) generates the same efficient code after
  this patch with -mfpu=neon and -mfpu=vfp (previously just the fact of
  having Neon enabled resulted inefficient code for no reason).
  
  unsigned long long f(unsigned long long x, unsigned long long y)
  { return x & (y >> 33); }
  
  Before:
  strd    r4, r5, [sp, #-8]!
  lsr r4, r3, #1
  mov r5, #0
  and r1, r1, r5
  and r0, r0, r4
  ldrd    r4, r5, [sp]
  add sp, sp, #8
  bx  lr
  
  After:
  and r0, r0, r3, lsr #1
  mov r1, #0
  bx  lr
  
  Bootstrap and regress OK on arm-none-linux-gnueabihf --with-cpu=cortex-a57
  
  [1] https://gcc.gnu.org/ml/gcc-patches/2019-07/msg01301.html
  
  ChangeLog:
  2019-07-19  Wilco Dijkstra  
  
  * config/arm/iterators.md (qhs_extenddi_cstr): Update.
  (qhs_extenddi_cstr): Likewise.
  * config/arm/arm.md (ashldi3): Always expand early.
  (ashlsi3): Likewise.
  (ashrsi3): Likewise.
  (zero_extenddi2): Remove Neon variants.
  (extenddi2): Likewise.
  * config/arm/neon.md (ashldi3_neon_noclobber): Remove.
  (signed_shift_di3_neon): Likewise.
  (unsigned_shift_di3_neon): Likewise.
  (ashrdi3_neon_imm_noclobber): Likewise.
  (lshrdi3_neon_imm_noclobber): Likewise.
  (di3_neon): Likewise.
  (split extend): Remove DI extend split patterns.
  
  testsuite/
  * gcc.target/arm/neon-extend-1.c: Remove test.
  * gcc.target/arm/neon-extend-2.c: Remove test.
  ---
  
  diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
  index 
0dba97a4ebeed0c2133936ca662f1c9e86ffc6ba..10ed70dac4384354c0a2453c5e51a29108c6c062
 100644
  --- a/gcc/config/arm/arm.md
  +++ b/gcc/config/arm/arm.md
  @@ -3601,44 +3601,14 @@ (define_insn "*satsi__shift"
   (define_expand "ashldi3"
     [(set (match_operand:DI    0 "s_register_operand")
   (ashift:DI (match_operand:DI 1 "s_register_operand")
  -   (match_operand:SI 2 "general_operand")))]
  +   (match_operand:SI 2 "reg_or_int_operand")))]
     "TARGET_32BIT"
     "
  -  if (TARGET_NEON)
  -    {
  -  /* Delay the decision whether to use NEON or core-regs until
  -    register allocation.  */
  -  emit_insn (gen_ashldi3_neon (operands[0], operands[1], operands[2]));
  -  DONE;
  -    }
  -  else
  -    {
  -  /* Only the NEON case can handle in-memory shift counts.  */
  -  if (!reg_or_int_operand (operands[2], SImode))
  -    operands[2] = force_reg (SImode, operands[2]);
  -    }
  -
  -  if (!CONST_INT_P (operands[2]) && TARGET_REALLY_IWMMXT)
  -    ; /* No special preparation statements; expand pattern as above.  */
  -  else
  -    {
  -  rtx scratch1, scratch2;
  -
  -  /* Ideally we should use iwmmxt here if we could know that operands[1]
  - ends up already living in an iwmmxt register. Otherwise it's
  - cheaper to have the alternate code being generated than moving
  - values to iwmmxt regs and back.  */
  -
  -  /* Expand operation using core-registers.
  -    'FAIL' would achieve the same thing, but this is a bit smarter.  */
  -  scratch1 = gen_reg_rtx (SImode);
  -  scratch2 = gen_reg_rtx (SImode);
  -  arm_emit_coreregs_64bit_shift (ASHIFT, operands[0], operands[1],
  -    operands[2], scratch1, scratch2);
  -  DONE;
  -    }
  -  "
  -)
  +  arm_emit_coreregs_64bit_shift (ASHIFT, operands[0], operands[1],
  +    operands[2], gen_reg_rtx (SImode),
  +    gen_reg_rtx (SImode));
  +  DONE;
  +")
   
   (define_expand "ashlsi3"
     [(set (match_operand:SI    0 "s_register_operand")
  @@ -3661,35 +3631,11 @@ (define_expand "ashrdi3"
    (match_operand:SI 2 "reg_or_int_operand")))]
     "TARGET_32BIT"
     "
  -  if (TARGET_NEON)
  -    {
  -  /* Delay the decision whether to use NEON or core-regs until
  -    register allocation.  */
  -  emit_insn (gen_ashrdi3_neon (operands[0], operands[1], operands[2]));
  -  DONE;
  -    }
  -
  -  if (!CONST_INT_P (operands[2]) && TARGET_REALLY_IWMMXT)
  -    ; /* No special preparation statements; expand pattern as above.  */
  -  else
  -    {
  -  rtx scratch1, scratch2;
  -
  -  /* Ideally we should use iwmmxt here if we could know that operands[1]
  - ends up 

Re: [PATCH][ARM] Cleanup DImode shifts

2019-07-31 Thread Wilco Dijkstra
ping
   
 
Like the logical operations, expand all shifts early rather than only
 sometimes.  The Neon shift expansions are never emitted (not even with
 -fneon-for-64bits), so they are not useful.  So all the late expansions
 and Neon shift patterns can be removed, and shifts are more optimized
 as a result.  Since some extend patterns use Neon DImode shifts, remove
 the Neon extend variants and related splits.
 
 A simple example (relying on [1]) generates the same efficient code after
 this patch with -mfpu=neon and -mfpu=vfp (previously just the fact of
 having Neon enabled resulted inefficient code for no reason).
 
 unsigned long long f(unsigned long long x, unsigned long long y)
 { return x & (y >> 33); }
 
 Before:
     strd    r4, r5, [sp, #-8]!
     lsr r4, r3, #1
     mov r5, #0
     and r1, r1, r5
     and r0, r0, r4
     ldrd    r4, r5, [sp]
     add sp, sp, #8
     bx  lr
 
 After:
     and r0, r0, r3, lsr #1
     mov r1, #0
     bx  lr
 
 Bootstrap and regress OK on arm-none-linux-gnueabihf --with-cpu=cortex-a57
 
 [1] https://gcc.gnu.org/ml/gcc-patches/2019-07/msg01301.html
 
 ChangeLog:
 2019-07-19  Wilco Dijkstra  
 
     * config/arm/iterators.md (qhs_extenddi_cstr): Update.
     (qhs_extenddi_cstr): Likewise.
     * config/arm/arm.md (ashldi3): Always expand early.
     (ashlsi3): Likewise.
     (ashrsi3): Likewise.
     (zero_extenddi2): Remove Neon variants.
     (extenddi2): Likewise.
     * config/arm/neon.md (ashldi3_neon_noclobber): Remove.
     (signed_shift_di3_neon): Likewise.
     (unsigned_shift_di3_neon): Likewise.
     (ashrdi3_neon_imm_noclobber): Likewise.
     (lshrdi3_neon_imm_noclobber): Likewise.
     (di3_neon): Likewise.
     (split extend): Remove DI extend split patterns.
 
     testsuite/
     * gcc.target/arm/neon-extend-1.c: Remove test.
     * gcc.target/arm/neon-extend-2.c: Remove test.
 ---
 
 diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
 index 
0dba97a4ebeed0c2133936ca662f1c9e86ffc6ba..10ed70dac4384354c0a2453c5e51a29108c6c062
 100644
 --- a/gcc/config/arm/arm.md
 +++ b/gcc/config/arm/arm.md
 @@ -3601,44 +3601,14 @@ (define_insn "*satsi__shift"
  (define_expand "ashldi3"
    [(set (match_operand:DI    0 "s_register_operand")
  (ashift:DI (match_operand:DI 1 "s_register_operand")
 -   (match_operand:SI 2 "general_operand")))]
 +   (match_operand:SI 2 "reg_or_int_operand")))]
    "TARGET_32BIT"
    "
 -  if (TARGET_NEON)
 -    {
 -  /* Delay the decision whether to use NEON or core-regs until
 -    register allocation.  */
 -  emit_insn (gen_ashldi3_neon (operands[0], operands[1], operands[2]));
 -  DONE;
 -    }
 -  else
 -    {
 -  /* Only the NEON case can handle in-memory shift counts.  */
 -  if (!reg_or_int_operand (operands[2], SImode))
 -    operands[2] = force_reg (SImode, operands[2]);
 -    }
 -
 -  if (!CONST_INT_P (operands[2]) && TARGET_REALLY_IWMMXT)
 -    ; /* No special preparation statements; expand pattern as above.  */
 -  else
 -    {
 -  rtx scratch1, scratch2;
 -
 -  /* Ideally we should use iwmmxt here if we could know that operands[1]
 - ends up already living in an iwmmxt register. Otherwise it's
 - cheaper to have the alternate code being generated than moving
 - values to iwmmxt regs and back.  */
 -
 -  /* Expand operation using core-registers.
 -    'FAIL' would achieve the same thing, but this is a bit smarter.  */
 -  scratch1 = gen_reg_rtx (SImode);
 -  scratch2 = gen_reg_rtx (SImode);
 -  arm_emit_coreregs_64bit_shift (ASHIFT, operands[0], operands[1],
 -    operands[2], scratch1, scratch2);
 -  DONE;
 -    }
 -  "
 -)
 +  arm_emit_coreregs_64bit_shift (ASHIFT, operands[0], operands[1],
 +    operands[2], gen_reg_rtx (SImode),
 +    gen_reg_rtx (SImode));
 +  DONE;
 +")
  
  (define_expand "ashlsi3"
    [(set (match_operand:SI    0 "s_register_operand")
 @@ -3661,35 +3631,11 @@ (define_expand "ashrdi3"
   (match_operand:SI 2 "reg_or_int_operand")))]
    "TARGET_32BIT"
    "
 -  if (TARGET_NEON)
 -    {
 -  /* Delay the decision whether to use NEON or core-regs until
 -    register allocation.  */
 -  emit_insn (gen_ashrdi3_neon (operands[0], operands[1], operands[2]));
 -  DONE;
 -    }
 -
 -  if (!CONST_INT_P (operands[2]) && TARGET_REALLY_IWMMXT)
 -    ; /* No special preparation statements; expand pattern as above.  */
 -  else
 -    {
 -  rtx scratch1, scratch2;
 -
 -  /* Ideally we should use iwmmxt here if we could know that operands[1]
 - ends up already living in an iwmmxt register. Otherwise it's
 - cheaper to have the alternate code being generated than moving
 - values 

Re: [PATCH][ARM] Cleanup DImode shifts

2019-07-22 Thread Wilco Dijkstra
Hi Ramana,

> Thanks for this patch set - What I'm missing in this is any analysis as 
> to what's the impact on code generation for neon intrinsics that use 
> uint64_t ? Especially things like v_u64 ?

Well things like this continue to work exactly like before:

uint64x1_t f20(uint64x1_t x, uint64x1_t y)
{
  return vshl_u64 (x, y);
}

uint64x1_t f21(uint64x1_t x)
{
  return vshl_n_u64 (x, 10);
}

f20:
vmovd16, r0, r1 @ int
vmovd17, r2, r3 @ int
vshl.u64d16, d16, d17
vmovr0, r1, d16 @ int
bx  lr

f21:
vmovd16, r0, r1 @ int
vshl.i64d16, d16, #10
vmovr0, r1, d16 @ int
bx  lr

As you can see there is a problem with the uint64x1_t type which for a strange
reason maps to DImode, so avoiding Neon here would avoid lots of moves...

The vadd_u64 variant emits the right code already:

uint64x1_t f22(uint64x1_t x, uint64x1_t y)
{
  return vadd_u64 (x, y);
}

f22:
addsr0, r0, r2
adc r1, r1, r3
bx  lr

Wilco

Re: [PATCH][ARM] Cleanup DImode shifts

2019-07-22 Thread Ramana Radhakrishnan

On 22/07/2019 17:16, Wilco Dijkstra wrote:

Like the logical operations, expand all shifts early rather than only
sometimes.  The Neon shift expansions are never emitted (not even with
-fneon-for-64bits), so they are not useful.  So all the late expansions
and Neon shift patterns can be removed, and shifts are more optimized
as a result.  Since some extend patterns use Neon DImode shifts, remove
the Neon extend variants and related splits.

A simple example (relying on [1]) generates the same efficient code after
this patch with -mfpu=neon and -mfpu=vfp (previously just the fact of
having Neon enabled resulted inefficient code for no reason).

unsigned long long f(unsigned long long x, unsigned long long y)
{ return x & (y >> 33); }

Before:
 strdr4, r5, [sp, #-8]!
 lsr r4, r3, #1
 mov r5, #0
 and r1, r1, r5
 and r0, r0, r4
 ldrdr4, r5, [sp]
 add sp, sp, #8
 bx  lr

After:
 and r0, r0, r3, lsr #1
 mov r1, #0
 bx  lr

Bootstrap and regress OK on arm-none-linux-gnueabihf --with-cpu=cortex-a57

[1] https://gcc.gnu.org/ml/gcc-patches/2019-07/msg01301.html


Thanks for this patch set - What I'm missing in this is any analysis as 
to what's the impact on code generation for neon intrinsics that use 
uint64_t ? Especially things like v_u64 ?



Ramana




ChangeLog:
2019-07-19  Wilco Dijkstra  

* config/arm/iterators.md (qhs_extenddi_cstr): Update.
(qhs_extenddi_cstr): Likewise.
* config/arm/arm.md (ashldi3): Always expand early.
(ashlsi3): Likewise.
(ashrsi3): Likewise.
(zero_extenddi2): Remove Neon variants.
(extenddi2): Likewise.
* config/arm/neon.md (ashldi3_neon_noclobber): Remove.
(signed_shift_di3_neon): Likewise.
(unsigned_shift_di3_neon): Likewise.
(ashrdi3_neon_imm_noclobber): Likewise.
(lshrdi3_neon_imm_noclobber): Likewise.
(di3_neon): Likewise.
(split extend): Remove DI extend split patterns.

 testsuite/
* gcc.target/arm/neon-extend-1.c: Remove test.
* gcc.target/arm/neon-extend-2.c: Remove test.
---

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 
0dba97a4ebeed0c2133936ca662f1c9e86ffc6ba..10ed70dac4384354c0a2453c5e51a29108c6c062
 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -3601,44 +3601,14 @@ (define_insn "*satsi__shift"
  (define_expand "ashldi3"
[(set (match_operand:DI0 "s_register_operand")
  (ashift:DI (match_operand:DI 1 "s_register_operand")
-   (match_operand:SI 2 "general_operand")))]
+   (match_operand:SI 2 "reg_or_int_operand")))]
"TARGET_32BIT"
"
-  if (TARGET_NEON)
-{
-  /* Delay the decision whether to use NEON or core-regs until
-register allocation.  */
-  emit_insn (gen_ashldi3_neon (operands[0], operands[1], operands[2]));
-  DONE;
-}
-  else
-{
-  /* Only the NEON case can handle in-memory shift counts.  */
-  if (!reg_or_int_operand (operands[2], SImode))
-operands[2] = force_reg (SImode, operands[2]);
-}
-
-  if (!CONST_INT_P (operands[2]) && TARGET_REALLY_IWMMXT)
-; /* No special preparation statements; expand pattern as above.  */
-  else
-{
-  rtx scratch1, scratch2;
-
-  /* Ideally we should use iwmmxt here if we could know that operands[1]
- ends up already living in an iwmmxt register. Otherwise it's
- cheaper to have the alternate code being generated than moving
- values to iwmmxt regs and back.  */
-
-  /* Expand operation using core-registers.
-'FAIL' would achieve the same thing, but this is a bit smarter.  */
-  scratch1 = gen_reg_rtx (SImode);
-  scratch2 = gen_reg_rtx (SImode);
-  arm_emit_coreregs_64bit_shift (ASHIFT, operands[0], operands[1],
-operands[2], scratch1, scratch2);
-  DONE;
-}
-  "
-)
+  arm_emit_coreregs_64bit_shift (ASHIFT, operands[0], operands[1],
+operands[2], gen_reg_rtx (SImode),
+gen_reg_rtx (SImode));
+  DONE;
+")
  
  (define_expand "ashlsi3"

[(set (match_operand:SI0 "s_register_operand")
@@ -3661,35 +3631,11 @@ (define_expand "ashrdi3"
   (match_operand:SI 2 "reg_or_int_operand")))]
"TARGET_32BIT"
"
-  if (TARGET_NEON)
-{
-  /* Delay the decision whether to use NEON or core-regs until
-register allocation.  */
-  emit_insn (gen_ashrdi3_neon (operands[0], operands[1], operands[2]));
-  DONE;
-}
-
-  if (!CONST_INT_P (operands[2]) && TARGET_REALLY_IWMMXT)
-; /* No special preparation statements; expand pattern as above.  */
-  else
-{
-  rtx scratch1, scratch2;
-
-  /* Ideally we should use iwmmxt here if we could know that operands[1]
- ends up already living in 

[PATCH][ARM] Cleanup DImode shifts

2019-07-22 Thread Wilco Dijkstra
Like the logical operations, expand all shifts early rather than only
sometimes.  The Neon shift expansions are never emitted (not even with
-fneon-for-64bits), so they are not useful.  So all the late expansions
and Neon shift patterns can be removed, and shifts are more optimized
as a result.  Since some extend patterns use Neon DImode shifts, remove
the Neon extend variants and related splits.

A simple example (relying on [1]) generates the same efficient code after
this patch with -mfpu=neon and -mfpu=vfp (previously just the fact of
having Neon enabled resulted inefficient code for no reason).

unsigned long long f(unsigned long long x, unsigned long long y)
{ return x & (y >> 33); }

Before:
strdr4, r5, [sp, #-8]!
lsr r4, r3, #1
mov r5, #0
and r1, r1, r5
and r0, r0, r4
ldrdr4, r5, [sp]
add sp, sp, #8
bx  lr

After:
and r0, r0, r3, lsr #1
mov r1, #0
bx  lr

Bootstrap and regress OK on arm-none-linux-gnueabihf --with-cpu=cortex-a57

[1] https://gcc.gnu.org/ml/gcc-patches/2019-07/msg01301.html

ChangeLog:
2019-07-19  Wilco Dijkstra  

* config/arm/iterators.md (qhs_extenddi_cstr): Update.
(qhs_extenddi_cstr): Likewise.
* config/arm/arm.md (ashldi3): Always expand early.
(ashlsi3): Likewise.
(ashrsi3): Likewise.
(zero_extenddi2): Remove Neon variants.
(extenddi2): Likewise.
* config/arm/neon.md (ashldi3_neon_noclobber): Remove.
(signed_shift_di3_neon): Likewise.
(unsigned_shift_di3_neon): Likewise.
(ashrdi3_neon_imm_noclobber): Likewise.
(lshrdi3_neon_imm_noclobber): Likewise.
(di3_neon): Likewise.
(split extend): Remove DI extend split patterns.

testsuite/
* gcc.target/arm/neon-extend-1.c: Remove test.
* gcc.target/arm/neon-extend-2.c: Remove test.
---

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 
0dba97a4ebeed0c2133936ca662f1c9e86ffc6ba..10ed70dac4384354c0a2453c5e51a29108c6c062
 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -3601,44 +3601,14 @@ (define_insn "*satsi__shift"
 (define_expand "ashldi3"
   [(set (match_operand:DI0 "s_register_operand")
 (ashift:DI (match_operand:DI 1 "s_register_operand")
-   (match_operand:SI 2 "general_operand")))]
+   (match_operand:SI 2 "reg_or_int_operand")))]
   "TARGET_32BIT"
   "
-  if (TARGET_NEON)
-{
-  /* Delay the decision whether to use NEON or core-regs until
-register allocation.  */
-  emit_insn (gen_ashldi3_neon (operands[0], operands[1], operands[2]));
-  DONE;
-}
-  else
-{
-  /* Only the NEON case can handle in-memory shift counts.  */
-  if (!reg_or_int_operand (operands[2], SImode))
-operands[2] = force_reg (SImode, operands[2]);
-}
-
-  if (!CONST_INT_P (operands[2]) && TARGET_REALLY_IWMMXT)
-; /* No special preparation statements; expand pattern as above.  */
-  else
-{
-  rtx scratch1, scratch2;
-
-  /* Ideally we should use iwmmxt here if we could know that operands[1]
- ends up already living in an iwmmxt register. Otherwise it's
- cheaper to have the alternate code being generated than moving
- values to iwmmxt regs and back.  */
-
-  /* Expand operation using core-registers.
-'FAIL' would achieve the same thing, but this is a bit smarter.  */
-  scratch1 = gen_reg_rtx (SImode);
-  scratch2 = gen_reg_rtx (SImode);
-  arm_emit_coreregs_64bit_shift (ASHIFT, operands[0], operands[1],
-operands[2], scratch1, scratch2);
-  DONE;
-}
-  "
-)
+  arm_emit_coreregs_64bit_shift (ASHIFT, operands[0], operands[1],
+operands[2], gen_reg_rtx (SImode),
+gen_reg_rtx (SImode));
+  DONE;
+")
 
 (define_expand "ashlsi3"
   [(set (match_operand:SI0 "s_register_operand")
@@ -3661,35 +3631,11 @@ (define_expand "ashrdi3"
  (match_operand:SI 2 "reg_or_int_operand")))]
   "TARGET_32BIT"
   "
-  if (TARGET_NEON)
-{
-  /* Delay the decision whether to use NEON or core-regs until
-register allocation.  */
-  emit_insn (gen_ashrdi3_neon (operands[0], operands[1], operands[2]));
-  DONE;
-}
-
-  if (!CONST_INT_P (operands[2]) && TARGET_REALLY_IWMMXT)
-; /* No special preparation statements; expand pattern as above.  */
-  else
-{
-  rtx scratch1, scratch2;
-
-  /* Ideally we should use iwmmxt here if we could know that operands[1]
- ends up already living in an iwmmxt register. Otherwise it's
- cheaper to have the alternate code being generated than moving
- values to iwmmxt regs and back.  */
-
-  /* Expand operation using core-registers.
-'FAIL' would achieve the same thing, but this is a bit