[Bug target/106101] [12/13 Regression] ICE in reg_bitfield_target_p since r12-4428-g147ed0184f403b

2022-08-25 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106101

--- Comment #23 from Segher Boessenkool  ---
(In reply to Andreas Krebbel from comment #22)
> The longer a have been looking at these STRICT_LOW_PART issue the more I
> think that STRICT_LOW_PART is an awful way to express what we need:
> 
> - the information needed to understand what it is doing is distributed
> across 3 RTXs (strict_low_part (subreg:mode1 (reg:mode2 xx) OFS))
> - the big problems arise since the involved RTXs are separately optimized
> and we might end up with partial information without a clear definition of
> how to deal with that
> - actually it is really hard to handle the RTXs as one unit. Recursively
> walking RTXs needs to record whether we are in a STRICT_LOW_PART or not.
> 
> 
> I think it might make sense to explore other ways to express this:
> 
> 1. SUBREG flag - Looks easy, but it would be hard to catch all places which
> should care about that flag.
> 
> 2. Introduce a new RTX code which has a mode and an offset attached but does
> not require an additional SUBREG anymore.
> 
> 3. Since a STRICT_LOW_PART is essentially a bit insertion operation we could
> express it always with a ZERO_EXTRACT target operand and get rid of
> STRICT_LOW_PART entirely. A ZERO_EXTRACT would be somewhat more cumbersome
> to deal with, since it would always require to check the bit width and
> offset for all the cases which just use mode boundaries. But at least most
> passes know how to deal with them already.

4. With existing simple RTL:

(set (reg:DI x) (ior:DI (and:DI (reg:DI x) (const_int -4294967296))
(zero_extend:DI (reg:SI y

(ZERO_EXTRACT is never useful IMNSHO: it just makes the easy cases slightly
easier to write, and causes a lot of useless work everywhere else).

[Bug target/106101] [12/13 Regression] ICE in reg_bitfield_target_p since r12-4428-g147ed0184f403b

2022-08-25 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106101

--- Comment #22 from Andreas Krebbel  ---
The longer a have been looking at these STRICT_LOW_PART issue the more I think
that STRICT_LOW_PART is an awful way to express what we need:

- the information needed to understand what it is doing is distributed across 3
RTXs (strict_low_part (subreg:mode1 (reg:mode2 xx) OFS))
- the big problems arise since the involved RTXs are separately optimized and
we might end up with partial information without a clear definition of how to
deal with that
- actually it is really hard to handle the RTXs as one unit. Recursively
walking RTXs needs to record whether we are in a STRICT_LOW_PART or not.


I think it might make sense to explore other ways to express this:

1. SUBREG flag - Looks easy, but it would be hard to catch all places which
should care about that flag.

2. Introduce a new RTX code which has a mode and an offset attached but does
not require an additional SUBREG anymore.

3. Since a STRICT_LOW_PART is essentially a bit insertion operation we could
express it always with a ZERO_EXTRACT target operand and get rid of
STRICT_LOW_PART entirely. A ZERO_EXTRACT would be somewhat more cumbersome to
deal with, since it would always require to check the bit width and offset for
all the cases which just use mode boundaries. But at least most passes know how
to deal with them already.

[Bug target/106101] [12/13 Regression] ICE in reg_bitfield_target_p since r12-4428-g147ed0184f403b

2022-08-25 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106101

--- Comment #21 from Andreas Krebbel  ---
I have committed a patch now which accepts only SUBREGs before reload and then
also REGs to deal with how LRA operates right now.

I've continued a bit with the patch from Comment 18. It bootstraps on s390x and
x86-64. On s390x also the testsuite is clean. However, I see a few failures in
the arch specific tests on x86-64. The cases I looked at so far are the result
of several peepholes and splitters not being triggered anymore. I've fixed most
of them I think but there are also cases where I'm not sure what to do exactly.

In case of a matching constraint between a strict_low_part operand and a normal
operand. Reload now (with the patch from Comment 18) would remove the subreg on
the operand with the matching constraint and would leave it in for the
strict_low_part operand.

(insn 9 8 16 2 (parallel [
(set (strict_low_part (subreg:QI (reg/v:SI 0 ax [orig:86 a ] [86])
0))
(and:QI (reg:QI 0 ax [orig:86 a ] [86])
(reg:SI 4 si [92])))
(clobber (reg:CC 17 flags))
]) "/home/andreas/gcc/gcc/testsuite/gcc.target/i386/pr91188-1a.c":20:10
553 {*andqi_1_slp}
 (nil))

I think this should be addressed separately. Once we solved it I will adjust
the s390x backend again if necessary.

[Bug target/106101] [12/13 Regression] ICE in reg_bitfield_target_p since r12-4428-g147ed0184f403b

2022-08-25 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106101

--- Comment #20 from CVS Commits  ---
The master branch has been updated by Andreas Krebbel :

https://gcc.gnu.org/g:585a21bab3ec688c2039bff2922cc372d8558283

commit r13-2201-g585a21bab3ec688c2039bff2922cc372d8558283
Author: Andreas Krebbel 
Date:   Fri Jul 29 09:55:54 2022 +0200

PR 106101: IBM zSystems: Fix strict_low_part problem

This avoids generating illegal (strict_low_part (reg ...)) RTXs. This
required two changes:

1. Do not use gen_lowpart to generate the inner expression of a
STRICT_LOW_PART.  gen_lowpart might fold the SUBREG either because
there is already a paradoxical subreg or because it can directly be
applied to the register. A new wrapper function makes sure that we
always end up having an actual SUBREG.

2. Change the movstrict patterns to enforce a SUBREG as inner operand
of the STRICT_LOW_PARTs.  The new predicate introduced for the
destination operand requires a SUBREG expression with a
register_operand as inner operand.  However, since reload strips away
the majority of the SUBREGs we have to accept single registers as well
once we reach reload.

Bootstrapped and regression tested on IBM zSystems 64 bit.

gcc/ChangeLog:

PR target/106101
* config/s390/predicates.md (subreg_register_operand): New
predicate.
* config/s390/s390-protos.h (s390_gen_lowpart_subreg): New
function prototype.
* config/s390/s390.cc (s390_gen_lowpart_subreg): New function.
(s390_expand_insv): Use s390_gen_lowpart_subreg instead of
gen_lowpart.
* config/s390/s390.md ("*get_tp_64", "*zero_extendhisi2_31")
("*zero_extendqisi2_31", "*zero_extendqihi2_31"): Likewise.
("movstrictqi", "movstricthi", "movstrictsi"): Use the
subreg_register_operand predicate instead of register_operand.

gcc/testsuite/ChangeLog:

PR target/106101
* gcc.c-torture/compile/pr106101.c: New test.

[Bug target/106101] [12/13 Regression] ICE in reg_bitfield_target_p since r12-4428-g147ed0184f403b

2022-08-24 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106101

--- Comment #19 from Segher Boessenkool  ---
(In reply to Andreas Krebbel from comment #18)
> (In reply to Segher Boessenkool from comment #17)
> ...
> > Yes, but that says the high 48 bits of the hardware reg are untouched, which
> > is not true (only the high 16 of the low 32 are guaranteed unmodified).
> 
> Right, if the original register mode does not match the mode of the full
> hardreg, we continue to need that mode as the upper bound. So with the
> subreg folding in reload we appear to loose information we need to interpret
> the STRICT_LOW_PART correctly.

Exactly.  This is why strict_low_part of anything else than a subreg is
ill-defined.

[Bug target/106101] [12/13 Regression] ICE in reg_bitfield_target_p since r12-4428-g147ed0184f403b

2022-08-24 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106101

--- Comment #18 from Andreas Krebbel  ---
(In reply to Segher Boessenkool from comment #17)
...
> Yes, but that says the high 48 bits of the hardware reg are untouched, which
> is not true (only the high 16 of the low 32 are guaranteed unmodified).

Right, if the original register mode does not match the mode of the full
hardreg, we continue to need that mode as the upper bound. So with the subreg
folding in reload we appear to loose information we need to interpret the
STRICT_LOW_PART correctly.

I'm testing the following patch in combination with my other fix now:

diff --git a/gcc/lra-spills.cc b/gcc/lra-spills.cc
index 4ddbe477d92..9c125a9ce38 100644
--- a/gcc/lra-spills.cc
+++ b/gcc/lra-spills.cc
@@ -855,6 +855,7 @@ lra_final_code_change (void)

  for (i = id->insn_static_data->n_operands - 1; i >= 0; i--)
if ((DEBUG_INSN_P (insn) || ! static_id->operand[i].is_operator)
+   && ! static_id->operand[i].strict_low
&& alter_subregs (id->operand_loc[i], ! DEBUG_INSN_P (insn)))
  {
lra_update_dup (id, i);

With that change the SUBREG folding from comment #11 happens later in final
(cleanup_subreg_operands). I'm not sure whether we would have to prevent it
there as well?!

[Bug target/106101] [12/13 Regression] ICE in reg_bitfield_target_p since r12-4428-g147ed0184f403b

2022-08-22 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106101

--- Comment #17 from Segher Boessenkool  ---
(In reply to Andreas Krebbel from comment #16)
> (In reply to Segher Boessenkool from comment #15)
> > (In reply to Andreas Krebbel from comment #14)
> > > > So you are suggesting that every strict_low_part after reload can just 
> > > > be
> > > > removed?  If that is true, should we not just do exactly that then?
> > > 
> > > I think we have 3 options:
> > > (1) Prevent reload from removing SUBREGs in STRICT_LOW_PARTs.
> > > (2) Remove the STRICT_LOW_PART when resolving the inner SUBREG
> > > (3) Define what a (STRICT_LOW_PART (reg:mode x)) means. 
> ...
> > > (3) E.g. it means that the bits of hardreg x in its hardware mode (the 
> > > mode
> > > for UNITS_PER_WORD) which are not covered by MODE are not touched by the 
> > > SET.
> > 
> > But say you have (strict_low_part (subreg:HI (reg:SI) 0)) and the hardware
> > is 64-bit.  That only means the low 32 bits of the reg aren't clobbered, the
> > high 32 bits are fair game.  That does not agree with your proposed
> > semantics.
> 
> In that case I would have expected reload to turn this into 
> (strict_low_part (reg:HI xx))
> already.

Yes, but that says the high 48 bits of the hardware reg are untouched, which
is not true (only the high 16 of the low 32 are guaranteed unmodified).

[Bug target/106101] [12/13 Regression] ICE in reg_bitfield_target_p since r12-4428-g147ed0184f403b

2022-08-22 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106101

--- Comment #16 from Andreas Krebbel  ---
(In reply to Segher Boessenkool from comment #15)
> (In reply to Andreas Krebbel from comment #14)
> > > So you are suggesting that every strict_low_part after reload can just be
> > > removed?  If that is true, should we not just do exactly that then?
> > 
> > I think we have 3 options:
> > (1) Prevent reload from removing SUBREGs in STRICT_LOW_PARTs.
> > (2) Remove the STRICT_LOW_PART when resolving the inner SUBREG
> > (3) Define what a (STRICT_LOW_PART (reg:mode x)) means. 
...
> > (3) E.g. it means that the bits of hardreg x in its hardware mode (the mode
> > for UNITS_PER_WORD) which are not covered by MODE are not touched by the 
> > SET.
> 
> But say you have (strict_low_part (subreg:HI (reg:SI) 0)) and the hardware
> is 64-bit.  That only means the low 32 bits of the reg aren't clobbered, the
> high 32 bits are fair game.  That does not agree with your proposed
> semantics.

In that case I would have expected reload to turn this into 
(strict_low_part (reg:HI xx))
already.

[Bug target/106101] [12/13 Regression] ICE in reg_bitfield_target_p since r12-4428-g147ed0184f403b

2022-08-22 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106101

--- Comment #15 from Segher Boessenkool  ---
(In reply to Andreas Krebbel from comment #14)
> > So you are suggesting that every strict_low_part after reload can just be
> > removed?  If that is true, should we not just do exactly that then?
> 
> I think we have 3 options:
> (1) Prevent reload from removing SUBREGs in STRICT_LOW_PARTs.
> (2) Remove the STRICT_LOW_PART when resolving the inner SUBREG
> (3) Define what a (STRICT_LOW_PART (reg:mode x)) means. 
> 
> (1) For that, all passes after reload must be able to deal with these
> SUBREGs. Since SUBREGs are rare after reload it is hard to say how robust
> that handling is right now.

On some targets you always have subregs to describe the action of certain
machine instructions (like mulh one PowerPC).

In the past there were subregs of mem as well.  Those shouldn't occur anymore,
but most (all?) late passes still know how to handle it.


> (2) Here the question to me is which passes after reload currently do
> something with the strict-low-part info. Clearly a non-option if we would
> loose any optimizations with that.

Yes.  This potentially even changes semantics, unless we are sure nothing uses
the "high part" at all.


> (3) E.g. it means that the bits of hardreg x in its hardware mode (the mode
> for UNITS_PER_WORD) which are not covered by MODE are not touched by the SET.

But say you have (strict_low_part (subreg:HI (reg:SI) 0)) and the hardware
is 64-bit.  That only means the low 32 bits of the reg aren't clobbered, the
high 32 bits are fair game.  That does not agree with your proposed semantics.

[Bug target/106101] [12/13 Regression] ICE in reg_bitfield_target_p since r12-4428-g147ed0184f403b

2022-08-22 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106101

--- Comment #14 from Andreas Krebbel  ---
(In reply to Segher Boessenkool from comment #13)
> (Sorry I missed this)
> 
> (In reply to Andreas Krebbel from comment #11)
> > I've tried to change our movstrict backend patterns to use a predicate on
> > the dest operand which enforces a subreg. However, since reload strips the
> > subreg away when assigning hard regs we end up with a STRICT_LOW_PART of a
> > reg again. At least after reload something like this should be acceptable -
> > right?
> > 
> > 298r.ira:
> > (insn 8 16 17 3 (set (strict_low_part (subreg:SI (reg/v:DI 64 [ e ]) 4))
> > (const_int 0 [0])) "t.cc":37:17 1485 {movstrictsi}
> >  (nil))
> > 
> > 299r.reload:
> > (insn 8 16 17 3 (set (strict_low_part (reg:SI 11 %r11 [orig:64 e+4 ] [64]))
> > (mem/u/c:SI (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0  S4 A32]))
> > "t.cc":37:17 1485 {movstrictsi}
> >  (nil))
> 
> So you are suggesting that every strict_low_part after reload can just be
> removed?  If that is true, should we not just do exactly that then?

I think we have 3 options:
(1) Prevent reload from removing SUBREGs in STRICT_LOW_PARTs.
(2) Remove the STRICT_LOW_PART when resolving the inner SUBREG
(3) Define what a (STRICT_LOW_PART (reg:mode x)) means. 

(1) For that, all passes after reload must be able to deal with these SUBREGs.
Since SUBREGs are rare after reload it is hard to say how robust that handling
is right now.

(2) Here the question to me is which passes after reload currently do something
with the strict-low-part info. Clearly a non-option if we would loose any
optimizations with that.

(3) E.g. it means that the bits of hardreg x in its hardware mode (the mode for
UNITS_PER_WORD) which are not covered by MODE are not touched by the SET.

[Bug target/106101] [12/13 Regression] ICE in reg_bitfield_target_p since r12-4428-g147ed0184f403b

2022-08-19 Thread segher at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106101

--- Comment #13 from Segher Boessenkool  ---
(Sorry I missed this)

(In reply to Andreas Krebbel from comment #11)
> I've tried to change our movstrict backend patterns to use a predicate on
> the dest operand which enforces a subreg. However, since reload strips the
> subreg away when assigning hard regs we end up with a STRICT_LOW_PART of a
> reg again. At least after reload something like this should be acceptable -
> right?
> 
> 298r.ira:
> (insn 8 16 17 3 (set (strict_low_part (subreg:SI (reg/v:DI 64 [ e ]) 4))
> (const_int 0 [0])) "t.cc":37:17 1485 {movstrictsi}
>  (nil))
> 
> 299r.reload:
> (insn 8 16 17 3 (set (strict_low_part (reg:SI 11 %r11 [orig:64 e+4 ] [64]))
> (mem/u/c:SI (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0  S4 A32]))
> "t.cc":37:17 1485 {movstrictsi}
>  (nil))

So you are suggesting that every strict_low_part after reload can just be
removed?  If that is true, should we not just do exactly that then?

[Bug target/106101] [12/13 Regression] ICE in reg_bitfield_target_p since r12-4428-g147ed0184f403b

2022-08-19 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106101

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|12.2|12.3

--- Comment #12 from Richard Biener  ---
GCC 12.2 is being released, retargeting bugs to GCC 12.3.

[Bug target/106101] [12/13 Regression] ICE in reg_bitfield_target_p since r12-4428-g147ed0184f403b

2022-07-19 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106101

--- Comment #11 from Andreas Krebbel  ---
I've tried to change our movstrict backend patterns to use a predicate on the
dest operand which enforces a subreg. However, since reload strips the subreg
away when assigning hard regs we end up with a STRICT_LOW_PART of a reg again.
At least after reload something like this should be acceptable - right?

298r.ira:
(insn 8 16 17 3 (set (strict_low_part (subreg:SI (reg/v:DI 64 [ e ]) 4))
(const_int 0 [0])) "t.cc":37:17 1485 {movstrictsi}
 (nil))

299r.reload:
(insn 8 16 17 3 (set (strict_low_part (reg:SI 11 %r11 [orig:64 e+4 ] [64]))
(mem/u/c:SI (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0  S4 A32]))
"t.cc":37:17 1485 {movstrictsi}
 (nil))

[Bug target/106101] [12/13 Regression] ICE in reg_bitfield_target_p since r12-4428-g147ed0184f403b

2022-07-14 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106101

--- Comment #10 from Andreas Krebbel  ---
We generate the movstrict target operand with gen_lowpart. If the operand for
gen_lowpart is already a paradoxical subreg the two subregs cancel each other
out and we end up with a plain reg. I'm testing the following patch right now.
It falls back to a normal move in that case and fixes the testcase:

diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index 5aaf76a9490..d90ec1a6de1 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -6523,6 +6523,14 @@ s390_expand_insv (rtx dest, rtx op1, rtx op2, rtx src)
  rtx low_dest = gen_lowpart (smode, dest);
  rtx low_src = gen_lowpart (smode, src);

+ /* In case two subregs cancelled each other out, do a normal
+move.  */
+ if (!SUBREG_P (low_dest))
+   {
+ emit_move_insn (low_dest, low_src);
+ return true;
+   }
+
  switch (smode)
{
case E_QImode: emit_insn (gen_movstrictqi (low_dest, low_src));
return true;

[Bug target/106101] [12/13 Regression] ICE in reg_bitfield_target_p since r12-4428-g147ed0184f403b

2022-07-08 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106101

Martin Liška  changed:

   What|Removed |Added

Summary|[12/13 Regression] ICE in   |[12/13 Regression] ICE in
   |reg_bitfield_target_p   |reg_bitfield_target_p since
   ||r12-4428-g147ed0184f403b
 CC||marxin at gcc dot gnu.org
   Keywords|needs-bisection |

--- Comment #9 from Martin Liška  ---
Started with r12-4428-g147ed0184f403b.