from:"Segher Boessenkool"

Re: [PATCH] testsuite: Fix pr66144-3.c test to accept multiple equivalent insns. [PR115262]

2024-06-12 Thread Segher Boessenkool

On Wed, Jun 12, 2024 at 07:02:31PM -0500, Peter Bergner wrote:
> On 6/12/24 3:00 PM, Segher Boessenkool wrote:
> >>  /* { dg-do compile { target { powerpc64*-*-* } } } */
> > 
> > Probably should be an "lp64" instead?
> 
> Actually, there is nothing inherently 64-bit about the test case.
> I removed the target test altogether and it executes just fine on
> our BE system in both 32-bit and 64-bit modes, so I'll just drop
> the target test as part of the patch.

Ha, even better!

> >>  /* { dg-require-effective-target powerpc_vsx } */
> > 
> > This isn't needed either.
> 
> Maybe not strictly needed, but it shields us from users who force
> some options to be used via RUNTESTFLAGS env var that can cause the
> test case to FAIL.  I'm going to leave this for someone else to
> clean up.

Users can make most tests fail in interesting and exciting ways like
that, heh.

In general, only realistic settings are supported: things for which
hardware exists, an OS exists for, etc.  With any other settings many
things can fail, and that is Just Fine.

Thanks again,


Segher

Re: [PATCH] testsuite: Fix pr66144-3.c test to accept multiple equivalent insns. [PR115262]

2024-06-12 Thread Segher Boessenkool

Hi!

On Wed, Jun 12, 2024 at 02:49:40PM -0500, Peter Bergner wrote:
> testsuite: Fix pr66144-3.c test to accept multiple equivalent insns. 
> [PR115262]

("rs6000:", not "testsuite")

> Jeff's commit r15-831-g05daf617ea22e1 changed the instruction we expected
> for this test case into an equivalent instruction.  Modify the test case
> so it will accept any of three equivalent instructions we could get depending
> on the options used.

They aren't equivalent insns, but they are all simple insns, and all
just as cheap :-)

> I've verified this test case PASSes on all scenarios where the three possible
> instructions are generated.  Ok for trunk?

> --- a/gcc/testsuite/gcc.target/powerpc/pr66144-3.c
> +++ b/gcc/testsuite/gcc.target/powerpc/pr66144-3.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile { target { powerpc64*-*-* } } } */

Probably should be an "lp64" instead?

> -/* { dg-options "-mdejagnu-cpu=power8 -mvsx -O2 -ftree-vectorize" } */
> +/* { dg-options "-mdejagnu-cpu=power8 -mvsx -O2 -ftree-vectorize 
> -fno-unroll-loops" } */

The "-mvsx" is superfluous btw (implied by -mcpu=power8).

>  /* { dg-require-effective-target powerpc_vsx } */

This isn't needed either.

> -/* { dg-final { scan-assembler {\mvcmpequw\M} } } */
> -/* { dg-final { scan-assembler {\mxxsel\M}} } */
> +/* { dg-final { scan-assembler-times {\mvcmpequw\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\m(?:xxsel|xxlor|vor)\M} 1 } } */
>  /* { dg-final { scan-assembler-not {\mvspltisw\M} } } */
>  /* { dg-final { scan-assembler-not {\mxxlorc\M}   } } */

Okido, thanks!  The three trivial tweaks I suggest here are okay as
well if you want, after testing of course ;-)


Segher

Re: [PATCH v2] fix PowerPC < 7 w/ Altivec not to default to power7

2024-06-11 Thread Segher Boessenkool

Hi!

What does "powerpc < 7" mean?  Something before POWER ISA 2.06?

On Tue, Jun 11, 2024 at 04:22:54PM +0200, Rene Rebe wrote:
> Glibc uses .machine to determine assembler optimizations to use.

What does this mean?

.machine is an *output* for glibc; nothing in glibc reads source code.

Nothing the ".machine" directive does has anything to do with
optimisations.  Instead, it simply changes what architecture level is
used for the following code. what specific instructions are supported
mainly.

> --- a/gcc/testsuite/gcc.target/powerpc/pr97367.c.vanilla  2024-05-30 
> 18:26:29.839784279 +0200
> +++ b/gcc/testsuite/gcc.target/powerpc/pr97367.c  2024-10-06 
> 18:20:34.873818482 +0200
> @@ -0,0 +1,9 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mdejagnu-cpu=G5" } */
> +
> +int dummy ()
> +{
> +  return 0;
> +}
> +
> +/* { dg-final { scan-assembler "power4" } } */

Please explain (in the testcase, not here!) what this is meant to test!

You probably want to say {\mpower4\M} instead, btw.  Unless you want to
match ".machine spower436" as well?

Segher

Re: [PATCH V4 1/2] split complicate 64bit constant to memory

2024-06-11 Thread Segher Boessenkool

Hi!

On Tue, Jun 11, 2024 at 04:37:25PM +0800, Jiufu Guo wrote:
> Sometimes, a complicated constant is built via 3(or more)
> instructions.  Generally speaking, it would not be as fast
> as loading it from the constant pool (as the discussions in
> PR63281):
> "ld" is one instruction.  If consider "address/toc" adjust,
> we may count it as 2 instructions. And "pld" may need fewer
> cycles.

Yeah, three insns to build up a constant always was about as fast/slow
as loading a constant from memory.  When you need two related constants
loading from memory is slower, so we preferred building them up.

> As testing(SPEC2017), it could get better/stable runtime
> if set the threshold as "> 2" (compare with "> 3").

Interesting!  About how much speedup did you see?  0.1%?

> As known, because the constant is load from memory by this
> patch,  so this functionality may affect the cache missing.
> Also there may be possible side effects on icach. While,
> IMHO, this patch would be still do the right thing.

Yeah, but almost every codegen patch has *some* icache effect.  Your
benchmarks show a net positive effect, that is all that matters in the
end.

> gcc/ChangeLog:
> 
>   * config/rs6000/rs6000.cc (rs6000_emit_set_const): Update to split
>   complicate constant to memory.

(complicated)

And don't write "update", every patch is an update :-)

There are more conditions here, mention those as well?

But, why does base_reg_operand matter?  And, why for TARGET_ELF only?
That is the only place you tested I'm sure, but  make an educated guess
for the rest, why would it not be useful for other binary formats?

> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index e4dc629ddcc..f448df289a0 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -10240,6 +10240,21 @@ rs6000_emit_set_const (rtx dest, rtx source)
> c = sext_hwi (c, 32);
> emit_move_insn (lo, GEN_INT (c));
>   }
> +
> +  else if (base_reg_operand (dest, mode) && TARGET_64BIT
> +&& TARGET_ELF && num_insns_constant (source, mode) > 2)
> + {
> +   rtx sym = force_const_mem (mode, source);
> +   if (TARGET_TOC && SYMBOL_REF_P (XEXP (sym, 0))
> +   && use_toc_relative_ref (XEXP (sym, 0), mode))
> + {
> +   rtx toc = create_TOC_reference (XEXP (sym, 0), copy_rtx (dest));
> +   sym = gen_const_mem (mode, toc);
> +   set_mem_alias_set (sym, get_TOC_alias_set ());
> + }
> +
> +   emit_move_insn (dest, sym);
> + }
>else
>   rs6000_emit_set_long_const (dest, c);
>break;

Is there no utility function to put constants like this in memory?
Like, "output_toc" for example?

> diff --git a/gcc/testsuite/gcc.target/powerpc/const_anchors.c 
> b/gcc/testsuite/gcc.target/powerpc/const_anchors.c
> index 542e2674b12..f33c9a83f5e 100644
> --- a/gcc/testsuite/gcc.target/powerpc/const_anchors.c
> +++ b/gcc/testsuite/gcc.target/powerpc/const_anchors.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile { target has_arch_ppc64 } } */
> -/* { dg-options "-O2" } */
> +/* { dg-options "-O2 -fdump-rtl-final" } */
>  
>  #define C1 0x2351847027482577ULL
>  #define C2 0x2351847027482578ULL
> @@ -17,4 +17,5 @@ void __attribute__ ((noinline)) foo1 (long long *a, long 
> long b)
>  *a++ = C2;
>  }
>  
> -/* { dg-final { scan-assembler-times {\maddi\M} 2 } } */
> +/* Checking "final" instead checking "asm" output to avoid noise.  */
> +/* { dg-final { scan-rtl-dump-times {\madddi3\M} 2 "final" } } */

So, in the RTL dump it finds a named pattern adddi3, while in the asm
you find random other addi insns?  Hardly worth a comment in the
testcase itself, but heh.  The point is that you check for the RTL
pattern name instead of the machine insn.  The compiler pass you check
in isn't changed even!

> --- a/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c
> +++ b/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c
> @@ -6,11 +6,18 @@
>  /* { dg-final { scan-assembler-times {\mori\M} 4 } } */
>  /* { dg-final { scan-assembler-times {\mrldimi\M} 2 } } */
>  
> +/* The below macro helps to avoid loading constant from memory.  */
> +#define CONST_AVOID_BASE_REG(DEST, CST)  
>   \
> +  {  
>   \
> +register long long d asm ("r0") = CST;   
>   \
> +asm volatile ("std %1, %0" : : "m"(DEST), "r"(d));   
>   \
> +  }

This needs an actual changelog, not just "update".  Something as simple
as "New macro to force constant into a reg" is fine, but more than
"update" :-)

> +/* The below macro helps to avoid loading constant from memory.  */
> +#define CONST_AVOID_BASE_REG(DEST, CST)  
>   \
> +  {  
>   \
> +register long long d asm ("r0") = CST;

Re: [PATCH] rs6000: Fix up PCH in --enable-host-pie builds [PR115324]

2024-06-03 Thread Segher Boessenkool

Hi!

On Mon, Jun 03, 2024 at 04:55:05PM +0200, Jakub Jelinek wrote:
> PCH doesn't work properly in --enable-host-pie configurations on
> powerpc*-linux*.

PCH and PIE, two of my favourites ;-)

> For PCH though it actually results in saving those huge arrays (one is
> 130832 bytes, another 81568 bytes) into the .gch files and loading them back
> in full.  While the bifname and attr_string and next pointers are marked as
> GTY((skip)), they are actually saved to point to the .rodata and .data
> sections of the process which writes the PCH, but because cc1/cc1plus etc.
> are position independent executables with --enable-host-pie, when it is
> loaded from the PCH file, it can point in a completely different addresses
> where nothing is mapped at all or some random different thing appears at.

Yuck.

> So, either we'd need to add some further GTY extensions, or the following
> patch instead reworks it such that the fntype members which were the only
> reason for PCH in those arrays are moved to separate arrays.

And that just sidesteps the limitation in PCH?

>  void
>  rs6000_init_generated_builtins ()
>  {
> +  bifdata *rs6000_builtin_info_p;
> +  tree *rs6000_builtin_info_fntype_p;
> +  ovlddata *rs6000_instance_info_p;
> +  tree *rs6000_instance_info_fntype_p;
> +  ovldrecord *rs6000_overload_info_p;
> +  __asm ("" : "=r" (rs6000_builtin_info_p) : "0" (rs6000_builtin_info));

Bah.

It should not be called _p of course, it is not a predicate.  And
relying on the operand tie to not have to do a much more obvious
assignment, please don't.  Just *do* write assignments, and then use
a simple "+r"?

But you call this a hack anyway, you wouldn't propose to actually
include this patch :-)

> Bootstrapped/regtested on powerpc64le-linux and powerpc64-linux (-m32/-m64
> testing there), ok for trunk and after a while for release branches?

Yes please.  What nastiness.  Thanks for dealing with it!


Segher


> 2024-06-03  Jakub Jelinek  
> 
>   PR target/115324
>   * config/rs6000/rs6000-gen-builtins.cc (write_decls): Remove
>   GTY markup from struct bifdata and struct ovlddata and remove their
>   fntype members.  Change next member in struct ovlddata and
>   first_instance member of struct ovldrecord to have int type rather
>   than struct ovlddata *.  Remove GTY markup from rs6000_builtin_info
>   and rs6000_instance_info arrays, declare new
>   rs6000_builtin_info_fntype and rs6000_instance_info_fntype arrays,
>   which have GTY markup.
>   (write_bif_static_init): Adjust for the above changes.
>   (write_ovld_static_init): Likewise.
>   (write_init_bif_table): Likewise.
>   (write_init_ovld_table): Likewise.
>   * config/rs6000/rs6000-builtin.cc (rs6000_init_builtins): Likewise.
>   * config/rs6000/rs6000-c.cc (find_instance): Likewise.  Make static.
>   (altivec_resolve_overloaded_builtin): Adjust for the above changes.

Re: [Patch, rs6000, aarch64, middle-end] Add implementation for different targets for pair mem fusion

2024-05-31 Thread Segher Boessenkool

On Fri, May 31, 2024 at 09:14:21AM +0100, Richard Sandiford wrote:
> Segher Boessenkool  writes:
> > Hi!
> >
> > On Fri, May 31, 2024 at 01:21:44AM +0530, Ajit Agarwal wrote:
> >> Code is implemented with pure virtual functions to interface with target
> >> code.
> >
> > It's not a pure function.  A pure function -- by definition -- has no
> > side effects.  These things have side effects.
> >
> > What you mean is this is *an implementation* for C++ functions without
> > a generic implementation.  An obfuscation some people (like me) would
> > say.  But please call things what they are!  So not "pure function".
> > That has a meaning, and this isn't it.
> 
> "pure virtual function" is an established term.  The "pure" modifies
> "virtual", not "function".
> 
> The description is correct because the patch adds pure virtual functions
> to the base class and expects the derived class to override and implement
> them.

But this code -- the architecture implementation! -- certainly does
*not* add abstract functions: it should provide an implementation for
them, instead.  So the commit message is completely misleading :-(

And no, it is not an established term, not outside of the C++ world.
Which GCC agreed *not* to dive too much into.  You can only sanely write
most of compilers -- just like anything where algorithmics matter -- in
an imperative, procedural language.  Obfuscation (like action at a
distance like this, where the meaning of a program depends hugely on
knowledge of other parts of the program, far away!) is the devil.

Reading the program is at least 1000x more important than writing it.
Writing is easy.  Reading and understanding, not so much.

> >>* config/aarch64/aarch64-ldp-fusion.cc: Add target specific
> >>implementation of additional virtual functions added in pair_fusion
> >>struct.
> >
> > This does not belong in this patch.  Do not send "rs6000" patches that
> > touch anything outside of config/rs6000/ and similar, certainly not in
> > config/something-else/!
> >
> > This would be WAY easier to review (read: AT ALL POSSIBLE) if you
> > included some detailed rationale and design document.
> 
> Please don't shout.
> 
> I don't think this kind of aggressive review is helpful to the project.

And I don't think wasting people's time is helpful.

I don't shout.  If you read that as shouting, that's not my problem.

It is EMPHASIS (there are no small caps in email).  Do you prefer *fat
print* for that?  Or /slanted/?  Or **italics**?  _Underlined_ perhaps?

Segher

Re: [Patch, rs6000, aarch64, middle-end] Add implementation for different targets for pair mem fusion

2024-05-30 Thread Segher Boessenkool

Hi!

On Fri, May 31, 2024 at 01:21:44AM +0530, Ajit Agarwal wrote:
> Code is implemented with pure virtual functions to interface with target
> code.

It's not a pure function.  A pure function -- by definition -- has no
side effects.  These things have side effects.

What you mean is this is *an implementation* for C++ functions without
a generic implementation.  An obfuscation some people (like me) would
say.  But please call things what they are!  So not "pure function".
That has a meaning, and this isn't it.

>   * config/aarch64/aarch64-ldp-fusion.cc: Add target specific
>   implementation of additional virtual functions added in pair_fusion
>   struct.

This does not belong in this patch.  Do not send "rs6000" patches that
touch anything outside of config/rs6000/ and similar, certainly not in
config/something-else/!

This would be WAY easier to review (read: AT ALL POSSIBLE) if you
included some detailed rationale and design document.

Segher

Re: [PATCH v3 #2/2] [rs6000] adjust return_pc debug attrs

2024-05-30 Thread Segher Boessenkool

Hi Alex,

On Thu, May 30, 2024 at 01:40:27PM -0300, Alexandre Oliva wrote:
> Sorry, I misnumbered this patch as #1/2 when first posting v3.

I see at least five completely different patches in this email thread,
with different subjects and all.

> On May 28, 2024, Segher Boessenkool  wrote:
> 
> > Please don't (incorrectly!) line-wrap changelogs.  Lines are 80
> > characters wide, not 60 or 72 or whatever.  80.  Indents are tabs that
> > take 8 columns.
> 
> ACK.  When was it bumped up to 80, BTW?  It wasn't always like that, was
> it?

It always was like that.  Some people say 79, that is fine too.

It mostly irks me because lines that end in : and then a lot of empty
space look like peoople used one of those awful "write the changelog for
me" helper things, that *at best* *slow you down*, and always (always!)
cause worse results.

> I've noticed that something seems to change my line width settings
> in Emacs, but I must have missed that memo.

Line length in source code is 79 or 80.  In email it is 72 or so.  This
has not changed since the dawn of time :-)

> >> +/* Return the offset to be added to the label output after CALL_INSN
> >> +   to compute the address to be placed in DW_AT_call_return_pc.  */
> >> +
> >> +static int
> >> +rs6000_call_offset_return_label (rtx_insn *call_insn)
> >> +{
> >> +  /* All rs6000 CALL_INSN output patterns start with a b or bl, always
> 
> > This isn't true.  There is a crlogical insn before the bcl for sysv for
> > example.
> 
> Hmm, if that's so, this function will have to look at the insn and
> recognize those cases and use a different way to compute the offset.
> 
> However, I don't see any relevant uses of bcl in output patterns for
> insns containing a call rtx.

bl, bcl, what's the difference (bit 4 is, heh, 16 vs. 18).  Read bl if
you want -- the point is that there are crlogical insns before the
branch-and-link.

> >> +we compute the offset
> >> + back to the address of the call opcode proper, then add the
> >> + constant 4 bytes, to get the address after that opcode.  */
> >> +  return 4 - get_attr_length (call_insn);
> 
> > Please explain this magic, too -- in code preferably (so with a ? :
> > maybe, but don't try to "optimise" that expression, let the compiler
> > do that, it is much better at it anyway :-) )
> 
> There's neither optimization nor magic, it's literally what's written in
> the comment quoted above: starting from the label at the end of the call
> insn (that's what the caller offsets from, as in the documentation in
> the actual #1/2), subtract the length (to get to the address of the call
> opcode), and add 4 (to get past the call opcode).  Ok, I've reordered
> the two addends for an IMHO more readable expression, but that was all.

4 - length does not make any sense /an sich/, it *is* magic.

It is not clear it is correct at all, either.

> > Is that correct for all ABIs we support?  

(Context missing!  What did I ask?)

> Back when I wrote this patchset, I went through all call opcodes I could
> find in the md and .c files within rs6000/, and I was satisfied that it
> covered what we had then, but I won't pretend to know all about all of
> the ppc ABIs.  I may have missed disguised call insns, too.  I was
> hoping some ppc maintainer, more familiar with the port than I am, would
> catch any trouble on review and let me know about pitfalls and surprises
> to watch out for.

Yeah, things don't work that way.  If you need help, *ask* for that.
Don't pretend a patch is good if you have doubts yourself!

> > Even if so, it needs a lot more documentation than this.
> 
> I can write more documentation, but I'm at a loss as to what you're
> hoping for.  If you set clearer expectations, I'll be glad to oblige.

I want a patch submission that is a) understandable and b) a good thing
to have.  If a patch leaves me wondering what is going on at all, that
is not ideal ;-)


Segher

Re: [PATCH v3 #1/2] [rs6000] adjust return_pc debug attrs

2024-05-30 Thread Segher Boessenkool

On Wed, May 29, 2024 at 03:52:15AM -0300, Alexandre Oliva wrote:
> On May 27, 2024, "Kewen.Lin"  wrote:
> 
> > I wonder if it's possible to have a test case for this?
> 
> gcc.dg/guality/pr54519-[34].c at -O[1g] are fixed by this patch on
> ppc64le-linux-gnu.  Are these the sort of test case you're interested
> in, or are you looking for something that tests the offsets in debug
> info, rather than the end-to-end debugging feature?

A testcase specifically for this would be good, yes.  Where you can say
at the top of the file "This tests that ... [X and Y]" :-)


Segher

Re: [PATCHv3] Optab: add isfinite_optab for __builtin_isfinite

2024-05-28 Thread Segher Boessenkool

Hi!

On Mon, May 27, 2024 at 05:37:23PM +0800, HAO CHEN GUI wrote:
> --- a/gcc/builtins.cc
> +++ b/gcc/builtins.cc
> @@ -2459,8 +2459,9 @@ interclass_mathfn_icode (tree arg, tree fndecl)
>errno_set = true; builtin_optab = ilogb_optab; break;
>  CASE_FLT_FN (BUILT_IN_ISINF):
>builtin_optab = isinf_optab; break;
> -case BUILT_IN_ISNORMAL:
>  case BUILT_IN_ISFINITE:
> +  builtin_optab = isfinite_optab; break;

This needs a line break after the first ; (like after *any* semicolon
in C).  It is rather important that every "break;" stands out :-)

> +@cindex @code{isfinite@var{m}2} instruction pattern
> +@item @samp{isfinite@var{m}2}
> +Set operand 0 to nonzero if operand 1 is a finite @code{SFmode},
> +@code{DFmode}, or @code{TFmode} floating point number and to 0
> +otherwise.

operand 0 is the output of the builtin, right?  So write that instead?
"Return 1 if the operand (a scalar floating poiint number) is finite",
or such?

Segher

Re: [PATCHv3] Optab: add isfinite_optab for __builtin_isfinite

2024-05-28 Thread Segher Boessenkool

On Tue, May 28, 2024 at 02:09:50PM +0200, Richard Biener wrote:
> On Tue, May 28, 2024 at 9:09 AM Kewen.Lin  wrote:
> > As Haochen's previous reply, I think there are three cases:
> >   1) no optab defined, fold in a generic way;
> >   2) optab defined, SUCC, expand as what it defines;
> >   3) optab defined, FAIL, generate a library call;
> >
> > From above, I had the concern that ports may assume FAILing can
> > fall back with the generic folding, but it's not actually.
> 
> Hmm, but it should.  Can you make that work?

That certainly would be the least surprising!


Segher

Re: [PATCH v3 #1/2] [rs6000] adjust return_pc debug attrs

2024-05-28 Thread Segher Boessenkool

On Sat, May 25, 2024 at 09:13:12AM -0300, Alexandre Oliva wrote:
> Some of the rs6000 call patterns, on some ABIs, issue multiple opcodes
> out of a single call insn, but the call (bl) or jump (b) is not always
> the last opcode in the sequence.

> This does not seem to be a problem for exception handling tables, but
> the return_pc attribute in the call graph output in dwarf2+ debug
> information, that takes the address of a label output right after the
> call, does not match the value of the link register even for non-tail
> calls.  E.g., with ABI_AIX or ABI_ELFv2, such code as:
> 
>   foo ();
> 
> outputs:
> 
>   bl foo
>   nop
>  LVL#:
> [...]
>   .8byte .LVL#  # DW_AT_call_return_pc
> 
> but debug info consumers may rely on the return_pc address, and draw
> incorrect conclusions from its off-by-4 value.
> 
> This patch uses the infrastructure for targets to add an offset to the
> label issued after the call_insn to set the call_return_pc attribute,
> on rs6000, to account for opcodes issued after actual call opcode as
> part of call insns output patterns.

> for  gcc/ChangeLog
> 
>   * config/rs6000/rs6000.cc (TARGET_CALL_OFFSET_RETURN_LABEL):
>   Override.

Please don't (incorrectly!) line-wrap changelogs.  Lines are 80
characters wide, not 60 or 72 or whatever.  80.  Indents are tabs that
take 8 columns.

> +/* Return the offset to be added to the label output after CALL_INSN
> +   to compute the address to be placed in DW_AT_call_return_pc.  */
> +
> +static int
> +rs6000_call_offset_return_label (rtx_insn *call_insn)
> +{
> +  /* All rs6000 CALL_INSN output patterns start with a b or bl, always

This isn't true.  There is a crlogical insn before the bcl for sysv for
example.

> + a 4-byte instruction, but some output patterns issue other
> + opcodes afterwards.  The return label is issued after the entire
> + call insn, including any such post-call opcodes.  Instead of
> + figuring out which cases need adjustments, we compute the offset
> + back to the address of the call opcode proper, then add the
> + constant 4 bytes, to get the address after that opcode.  */
> +  return 4 - get_attr_length (call_insn);

Please explain this magic, too -- in code preferably (so with a ? :
maybe, but don't try to "optimise" that expression, let the compiler
do that, it is much better at it anyway :-) )

> +}

Is that correct for all ABIs we support?  Even if so, it needs a lot
more documentation than this.


Segher

Re: [PATCH v3 #1/2] enable adjustment of return_pc debug attrs

2024-05-28 Thread Segher Boessenkool

On Sat, May 25, 2024 at 09:12:05AM -0300, Alexandre Oliva wrote:


You sent multiple patch series in one thread, and multiple versions of
the same series even.

This is very hard to even *read*, let alone work with.  Please don't.


Segher

Re: [PATCH] rs6000: Don't pass -many to the assembler [PR112868]

2024-05-22 Thread Segher Boessenkool

Hi!

On Wed, May 22, 2024 at 09:29:13AM -0500, Peter Bergner wrote:
> On 5/21/24 8:27 AM, jeevitha wrote:
> > The following patch has been bootstrapped and regtested with default 
> > configuration
> > [--enable-checking=yes] and with --enable-checking=release on 
> > powerpc64le-linux.
> > 
> > This patch removes passing the -many assembler option for release builds. 
> > Now,
> > GCC no longer passes -many under any conditions to the assembler.

Why do we want that?  I cannot read minds.

> You are missing a ChangeLog entry for the target-supports.exp change plus
> there is no mention of why it's needed in the git log entry.

In the commit message you mean?  Yeah.  This info belongs in the commit
message.

Is the target-supports thing that Cell thing?  That does not belong here
at all.  If it wasn't simply a mistake, it should be a separate commit,
with a lot of explanation.

Segher

Re: [PATCH-4, rs6000] Implement optab_isnormal for SFmode, DFmode and TFmode [PR97786]

2024-05-17 Thread Segher Boessenkool

On Fri, May 17, 2024 at 10:38:54AM +0800, HAO CHEN GUI wrote:
> This expand calls gen_xststdcp which is a P9 vector instruction and
> relies on "TARGET_P9_VECTOR". So I set the condition.

Why?  It needs P9, sure, and MSR[VSX] set, but the operands being VSX
registers takes care of that, heh.

But it's fine, the insn patterns it uses use the same conditions
already.

Segher

Re: [PATCH-4, rs6000] Implement optab_isnormal for SFmode, DFmode and TFmode [PR97786]

2024-05-16 Thread Segher Boessenkool

Hi!

On Fri, Apr 12, 2024 at 04:24:23PM +0800, HAO CHEN GUI wrote:
>   This patch implemented optab_isnormal for SF/DF/TFmode by rs6000 test
> data class instructions.
> 
>   This patch relies on former patch which adds optab_isnormal.
> https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649366.html

> gcc/
>   PR target/97786
>   * config/rs6000/vsx.md (isnormal2): New expand for SFmode and
>   DFmode.

* config/rs6000/vsx.md (isnormal2 for SFDF): New expand.
(isnormal2 for IEEE128): New expand.

> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -5357,6 +5357,30 @@ (define_expand "isfinite2"
>DONE;
>  })
> 
> +(define_expand "isnormal2"
> +  [(use (match_operand:SI 0 "gpc_reg_operand"))
> + (use (match_operand:SFDF 1 "gpc_reg_operand"))]
> +  "TARGET_HARD_FLOAT
> +   && TARGET_P9_VECTOR"

Please put the condition on just one line if it is as simple and short
as this.

Why is TARGET_P9_VECTOR the correct condition?

> +{
> +  rtx tmp = can_create_pseudo_p () ? gen_reg_rtx (SImode) : operands[0];

This is an expander.  can_create_pseudo_p always return true.  Please
simplify the code, keeping that in mind :-)

> +(define_expand "isnormal2"
> +  [(use (match_operand:SI 0 "gpc_reg_operand"))
> + (use (match_operand:IEEE128 1 "gpc_reg_operand"))]
> +  "TARGET_HARD_FLOAT
> +   && TARGET_P9_VECTOR"
> +{
> +  rtx tmp = can_create_pseudo_p () ? gen_reg_rtx (SImode) : operands[0];
> +  emit_insn (gen_xststdcqp_ (tmp, operands[1], GEN_INT (0x7f)));
> +  emit_insn (gen_xorsi3 (operands[0], tmp, const1_rtx));
> +  DONE;
> +})

Same issues here, of course.

> +

Why add radom white lines?  Pleaase don't.

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-7.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power9 -mvsx" } */

If you use a -mcpu=, don't use vsx_ok.

If you use a -mcpu=, don't use -mvsx.

> +int test1 (double x)
> +{
> +  return __builtin_isnormal (x);
> +}
> +
> +int test2 (float x)
> +{
> +  return __builtin_isnormal (x);
> +}
> +
> +/* { dg-final { scan-assembler-not {\mfcmpu\M} } } */

Just \mfcmp please (so that it also catches fcmpo, if we ever generate
that).

> +/* { dg-final { scan-assembler-times {\mxststdc[sd]p\M} 2 } } */

Maybe you should test for one each of the s and d version?  So just
/* { dg-final { scan-assembler-times {\mxststdcsp\M} 1 } } */
/* { dg-final { scan-assembler-times {\mxststdcdp\M} 1 } } */

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-8.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile { target lp64 } } */

Why run this on 64-bit systems only?  If there is a reason, document
that here (but is there a reason?)

> +/* { dg-require-effective-target ppc_float128_sw } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power9 -mvsx -mabi=ieeelongdouble 
> -Wno-psabi" } */

Same comments here: If you have a -mcpu you do not want vsx_ok or -mvsx.

Please fix these things and resend.  Thanks!


Segher

Re: [PATCH] report message for operator %a on unaddressible exp

2024-05-16 Thread Segher Boessenkool

Hi!

On Thu, May 16, 2024 at 02:56:49PM +0800, Jiufu Guo wrote:
> Jiufu Guo  writes:
> > Segher Boessenkool  writes:
> >> On Tue, May 14, 2024 at 05:53:56PM +0800, Jiufu Guo wrote:
> >>> Thanks so much for your great review!
> >>> Reference other messages, I'm wondering "invalid %%a value" may be
> >>> acceptable, or "invalid %%a address expression in TOC" maybe better.
> >>
> >> "%%a requires a memory operand"?  Maybe even print out the actual
> >> operand given, too.
> >
> > Thanks! I updated the code using:
> > "%%a requires a memory reference operand", since the actual operand
> > is treated as the address.
> 
> I suspect one thing here: if "%%a requires memory" is accurate vs.
> "%%a requires a memory reference".
> 
> Reference the words from doc:
> https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Generic-Operand-Modifiers
> a: Substitute a memory reference, with the actual operand treated as the
> address.
> 
> And for below code:
> '("#%a0" : :"m"(x))' is not accepted.

Yeah, it always confuses me.  Sorry.  The operand is the actual address.

> While '("#%a0" : :"r"())' is ok.
> 
> So, it may be more accurate that: "%%a" as requirement of address of
> memory.

That sounds good yes.


Segher

Re: [PATCH] report message for operator %a on unaddressible exp

2024-05-14 Thread Segher Boessenkool

On Tue, May 14, 2024 at 05:53:56PM +0800, Jiufu Guo wrote:
> Thanks so much for your great review!
> Reference other messages, I'm wondering "invalid %%a value" may be
> acceptable, or "invalid %%a address expression in TOC" maybe better.

"%%a requires a memory operand"?  Maybe even print out the actual
operand given, too.


Segher

Re: [PATCH] report message for operator %a on unaddressible exp

2024-05-14 Thread Segher Boessenkool

Oh, btw:

On Tue, May 14, 2024 at 11:00:38AM +0800, Jiufu Guo wrote:
> >> --- a/gcc/config/rs6000/rs6000.cc
> >> +++ b/gcc/config/rs6000/rs6000.cc
> >> @@ -14659,6 +14659,12 @@ print_operand_address (FILE *file, rtx x)
> >>else if (SYMBOL_REF_P (x) || GET_CODE (x) == CONST
> >>   || GET_CODE (x) == LABEL_REF)
> >>  {
> >> +  if (this_is_asm_operands && !address_operand (x, VOIDmode))
> >> +  {
> >> +output_operand_lossage ("invalid expression as operand");
> >> +return;
> >> +  }

That error message is not so good.  Firstly, it typically *is* a valid
expression here, just not a correct expression to have for an address.
But, more generally and usefully, the error message should say *what* is
wrong about the expression (namely, it is not an address).

Most of the time you can use the same error message for asm and other
expressions, and you get a great message in all contexts.
operand_lossage already takes care of telling the user "you did
something foolish" for inline asm, or "ICE" if it is a compiler problem
instead.

In error messages you do not often know what caused the problem, so
just report on the facts you *do* know (and moreso with warnings, there
you typically only know something looks unusual).

Segher

Re: [PATCH] report message for operator %a on unaddressible exp

2024-05-14 Thread Segher Boessenkool

On Tue, May 14, 2024 at 11:00:38AM +0800, Jiufu Guo wrote:
> >> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> >> index 117999613d8..50943d76f79 100644
> >> --- a/gcc/config/rs6000/rs6000.cc
> >> +++ b/gcc/config/rs6000/rs6000.cc
> >> @@ -14659,6 +14659,12 @@ print_operand_address (FILE *file, rtx x)
> >>else if (SYMBOL_REF_P (x) || GET_CODE (x) == CONST
> >>   || GET_CODE (x) == LABEL_REF)
> >>  {
> >> +  if (this_is_asm_operands && !address_operand (x, VOIDmode))
> >
> > Do we really need this_is_asm_operands here?
> I understand your point: 
> since in function 'print_operand_address' which supports not only user
> asm code.  So, it maybe incorrect if 'x' is not an 'address_operand',
> no matter this_is_asm_operands.
> 
> Here, 'this_is_asm_operands' is needed because it would be treated as an
> user fault in asm-code (otherwise, internal_error in the compiler).

You almost never want to test for asm, and just give the same error you
would give in non-asm.  It is the same problem after all, and giving the
user the same error message is the most helpful thing to do!

It can be useful to not say "ICE", but it already is prevented from
doing that here.


Segher

Re: [PATCH] report message for operator %a on unaddressible exp

2024-05-13 Thread Segher Boessenkool

Hi!

On Mon, May 13, 2024 at 10:57:12AM +0800, Jiufu Guo wrote:
> For PR96866, when gcc print asm code for modifier "%a" which requires
> an address operand,

It requires a *memory* operand, and it outputs its address.  This is a
generic modifier btw (not rs6000).

> while the operand is with the constraint "X" which
> allow non-address form.  An error message would be reported to indicate
> the invalid asm operands.

"non-address form"?  Every mem has an address.

But 'X' is not memory.  What is it at all?  Why do we use that when you
*have to* have mem here?

The code you add that tests for address_operand looks wrong.  I would
expect it to test the operand is memory, instead :-)

Segher

Re: [PATCH] rtlanal: Correct cost regularization in pattern_cost

2024-05-10 Thread Segher Boessenkool

On Fri, May 10, 2024 at 02:50:56PM +0200, Richard Biener wrote:
> But for example a reg-reg move when optimizing for speed could have
> a zero associated cost.

Sure, but this is the way things always have been.  I'm sure there are
ways to change things so they become slightly easier to use, but that is
not what we have now (or what we ever have had).

> You might argue that's a bug since there's
> an actual instruction and thus at least a size cost (and decode cost)
> but then I've seen too much zero cost stuff in backends (like that
> combine PR causing s390 backend making address-cost zero even
> though it's just "same cost").

address_cost is something else entirely.  Many backends have that set
to 0 btw, and that means 0, not "unknown".  It means all forms of
address in valid machine instructions have the same execution (or size)
costs.

> IMO give we're dispatching to the rtx_cost hook eventually it needs
> documenting there or alternatively catching zero and adjusting its
> result there.  Of course cost == 0 ? 1 : cost is wrong as it makes
> zero vs. one the same cost - using cost + 1 when from rtx_cost
> might be more correct, at least preserving relative costs.

Most things should not use rtx_cost at all, only insn_cost.  Taking the
"cost" of any random RTL expression makes no sense at all.  Neither
conceptually, nor practically: it causes many problems, and solves none.

Most things already use only insn_cost, if you have the appropriate
hooks implemented in your backend (all one of them even).  This is so
much easier to use (you only need to recognise some big categories of
instructions, for a typical target core, instead of eighty different
RTX codes, and the combination of them), and gives way better results.

Segher

Re: [PATCH] rtlanal: Correct cost regularization in pattern_cost

2024-05-10 Thread Segher Boessenkool

On Fri, May 10, 2024 at 12:19:35PM +0200, Richard Biener wrote:
> On Fri, May 10, 2024 at 11:06 AM Segher Boessenkool
>  wrote:
> > *All* code using a cost will have to be inspected and possibly adjusted
> > if you decide to use a different value for "unknown" than what we have
> > had for ages.  All other cost functions interacting with this one, too.
> 
> Btw, looking around pattern_cost is the only API documenting this special
> value and the function after it using this function, insn_cost does the same
> but
> 
> int
> insn_cost (rtx_insn *insn, bool speed)
> {
>   if (targetm.insn_cost)
> return targetm.insn_cost (insn, speed);
> 
> and the target hook doesn't document this special value.  set_src_cost
> doesn't either, btw (that just uses rtx_cost).  So I don't think how
> pattern_cost handles the set_src_cost result is warranted.  There's
> simply no way to detect whether set_src_cost returns an actual
> value - on the contrary, it always does.

I introduced insn_cost.  I didn't think about documenting that 0 means
unknown, precisely because that is so pervasive!


Segher

Re: [PATCH] rtlanal: Correct cost regularization in pattern_cost

2024-05-10 Thread Segher Boessenkool

On Fri, May 10, 2024 at 04:50:10PM +0800, HAO CHEN GUI wrote:
> Hi Richard,
>   Thanks for your comments.
> 
> 在 2024/5/10 15:16, Richard Biener 写道:
> > But if targets return sth < COSTS_N_INSNS (1) but > 0 this is now no
> > longer meaningful.  So shouldn't it instead be
> > 
> >   return cost > 0 ? cost : 1;
> Yes, it's better.
> 
> > 
> > ?  Alternatively returning fractions of COSTS_N_INSNS (1) from set_src_cost
> > is invalid and thus the target is at fault (I do think that making zero the
> > unknown value is quite bad since that makes it impossible to have zero
> > as cost represented).
> > 
> > It seems the check is to aovid pattern_cost return zero (unknown), so the
> > comment holds to pattern_cost the same (it returns an 'int' so the better
> > exceptional value would have been -1, avoiding the compare).
> But sometime it adds an insn cost. If the unknown cost is -1, the total cost
> might be distorted.

*All* code using a cost will have to be inspected and possibly adjusted
if you decide to use a different value for "unknown" than what we have
had for ages.  All other cost functions interacting with this one, too.


Segher

Re: [PATCH] rtlanal: Correct cost regularization in pattern_cost

2024-05-10 Thread Segher Boessenkool

Hi!

On Fri, May 10, 2024 at 09:16:26AM +0200, Richard Biener wrote:
> On Fri, May 10, 2024 at 4:25 AM HAO CHEN GUI  wrote:
> >But if set_src_cost returns a value less than COSTS_N_INSNS (1), it's
> > untouched and just returned by pattern_cost. Thus "zero" from set_src_cost
> > is higher than "one" from set_src_cost.
> >
> >   For instance, i386 returns cost "one" for zero_extend op.
> > //ix86_rtx_costs
> > case ZERO_EXTEND:
> >   /* The zero extensions is often completely free on x86_64, so make
> >  it as cheap as possible.  */
> >   if (TARGET_64BIT && mode == DImode
> >   && GET_MODE (XEXP (x, 0)) == SImode)
> > *total = 1;
> >
> >   This patch fixes the problem by converting all costs which are less than
> > COSTS_N_INSNS (1) to COSTS_N_INSNS (1).
> >
> >   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
> > regressions. Is it OK for the trunk?
> 
> But if targets return sth < COSTS_N_INSNS (1) but > 0 this is now no
> longer meaningful.  So shouldn't it instead be
> 
>   return cost > 0 ? cost : 1;

I don't think either is very good.  Why does it want to convert "unknown
cost" to some known cost at all?  Fixing *that* would really improve
things!

> ?  Alternatively returning fractions of COSTS_N_INSNS (1) from set_src_cost
> is invalid and thus the target is at fault (I do think that making zero the
> unknown value is quite bad since that makes it impossible to have zero
> as cost represented).

No, we have COST_N_INSNS exactly because fractions of integer costs are
useful to have.  COST_N_INSNS (1) really stands for "the cost of a cheap
integer isns with latency 1", and many targets want to express some
insns are better than others, even if they have the same latency.
Perhaps some insns use less of some resource, etc.

> It seems the check is to aovid pattern_cost return zero (unknown), so the
> comment holds to pattern_cost the same (it returns an 'int' so the better
> exceptional value would have been -1, avoiding the compare).

Cost 0 for unkown is used in (almost) *all* cost things in RTL.  Pretty
much everything can deal with it just fine.  What is different here?

The abstraction "pattern_cost" is a lousy abstraction of course, but
where is this used?  Cost "unknown" can be passed through everywhere,
in principle.

Segher

Re: [PATCH 3/3] combine: initialize a local var

2024-05-03 Thread Segher Boessenkool

On Thu, May 02, 2024 at 02:38:12PM -0600, Jeff Law wrote:
> 
> 
> On 5/2/24 12:59 PM, Vineet Gupta wrote:
> >This is no logic change (but technically still a functional change).
> >
> >Ran into this when stepping thru combine code.
> >@newpat has some random garbage for a bit until it is actually set.
> >With the fix it remains 0 until actually set.
> >
> >gcc/ChangeLog:
> > * combine.cc (try_combine): Initialize newpat.
> Isn't the same true even after this change if you turn on the optimizer? 
>  And isn't the same true for many other objects that are initialized 
> lazily?

For all, even.

Without this change the compiler can (in theory, anyway) diagnose the
uninitialised use.  After, there *is* no uninitialised use anymore.

Please don't do this.  It is not an improvement, it is several steps
back, to satisfy a misguides sense of "security".

Segher

Re: [PATCH 3/3] combine: initialize a local var

2024-05-03 Thread Segher Boessenkool

On Thu, May 02, 2024 at 11:59:24AM -0700, Vineet Gupta wrote:
> This is no logic change (but technically still a functional change).

Where are 1/3 and 2/3?  Or are those unrelated?  Please don't make
series like that.

> Ran into this when stepping thru combine code.
> @newpat has some random garbage for a bit until it is actually set.
> With the fix it remains 0 until actually set.

The same is true for all uninitialised variables.  Setting everything
to zero explicitly is a) quite a bit slower, and b) just as wrong!
For example, here, newpat should never be zero.  Never.  It does not
make any sense.

Is there any place where newpat is used uninitialised?

Segher

Re: [PATCH, rs6000] Use bcdsub. instead of bcdadd. for bcd invalid number checking

2024-04-18 Thread Segher Boessenkool

On Thu, Apr 18, 2024 at 11:14:42AM +0800, Kewen.Lin wrote:
> on 2024/4/18 10:01, HAO CHEN GUI wrote:
> >   This patch replace bcdadd. with bcdsub. for bcd invalid number checking.
> > bcdadd on two same numbers might cause overflow which also set
> > overflow/invalid bit so that we can't distinguish it's invalid or overflow.
> > The bcdsub doesn't have the problem as subtracting on two same number never
> > causes overflow.
> > 
> >   Bootstrapped and tested on powerpc64-linux BE and LE with no
> > regressions. Is it OK for the trunk?
> 
> Considering that this issue affects some basic functionality of bcd bifs
> and the fix itself is simple and very safe, OK for trunk, thanks for fixing!

Yup.  If a number X is invalid the X-X calculation might set the
overflow flag as well, but we cannot see that difference at all anyway,
it always has set the invalid flag after all.

Thanks!


Segher

Re: [PATCH] rs6000: Add OPTION_MASK_POWER8 [PR101865]

2024-04-12 Thread Segher Boessenkool

Hi!

On Thu, Apr 11, 2024 at 11:23:02PM -0500, Peter Bergner wrote:
> On 4/11/24 10:31 PM, Kewen.Lin wrote:
> >> +;; This option exists only to create its MASK.  It is not intended for 
> >> users.
> >> +mdo-not-use-this-option
> >> +Target RejectNegative Mask(POWER8) Var(rs6000_isa_flags) WarnRemoved
> >> +
> > 
> > I can understand the given name is to avoid users to use it, but it looks 
> > odd, personally
> > I'm inclined to mpower8 (or even mpower8-internal) even if it's more likely 
> > to be used but
> > it's a bit more meaningful (especially we already have mpower10), 
> > theoretically speaking
> > it's undocumented users shouldn't use it at all.
> 
> Sorry, I should have mentioned this, but I originally had it -mpower8, but 
> given
> it was an option we don't want users to use, Segher mentioned offline to give 
> it
> a name something like the above and not -mpower8.  I kind of like 
> -mpower8-internal
> now that you mention it, but I'd like Segher's input here whether he prefers
> -mdo-not-use-this-option or -mpower8-internal or something else???

-mpower8-internal is fine.  Anyone who thinks this would be a good thing
to us, well, we cannot stop them from hurting themselves I guess.  Esp.
with a nice help text it is fine.

Going forward we need something like this for most ISA levels (we
currently often use the existence of some more-or-less random insn for
this, but we need the same test to actually test for *that* insn, not a
good thing at all).  But we do not want the user to be able to use such
options at all, so we really shouln't make command line options for it.

So it should not use an option flag at all for this, but something else.
Maybe something new even, we have had problems around this forever, that
suggests we have insufficient abstractions around this :-)

> I'll make the changes above, modulo leaving the option name unchanged until
> we hear from Segher on that and report back on the LE and BE testing.

-mpower8-internal should dissuade users from using it, certainly people
who actually read the documentation as well.  It is unfortunate we need
to tell people to not use tools we provide ourselves, but this is
temporary, right :-)  (Right?!)

Thanks guys,


Segher

Re: Combine patch ping

2024-04-11 Thread Segher Boessenkool

On Wed, Apr 10, 2024 at 08:32:39PM +0200, Uros Bizjak wrote:
> On Wed, Apr 10, 2024 at 7:56 PM Segher Boessenkool
>  wrote:
> > This is never okay.  You cannot commit a patch without approval, *ever*.

This is the biggest issue, to start with.  It is fundamental.

> > That patch is also obvious -- obviously *wrong*, that is.  There are
> > big assumptions everywhere in the compiler how a CC reg can be used.
> > This violates that, as explained elsewhere.
> 
> Can you please elaborate what is wrong with this concrete patch.

The explanation of the patch is contradictory to how RTL works at all,
so it is just wrong.  It might even do something sane, but I didn't get
that far at all!

Write good email explanations, and a good proposed commit message.
Please.  It is the only one people can judge a patch.  Well, apart
from doing everything myself from first principles, ignoring everything
you said, just looking at the patch itself, but that is a hundred times
more work.  I don't do that.

> The
> part that the patch touches has several wrong assumptions, and the
> fixed "???" comment just emphasizes that. I don't see what is wrong
> with:
> 
> (define_insn "@pushfl2"
>   [(set (match_operand:W 0 "push_operand" "=<")
> (unspec:W [(match_operand 1 "flags_reg_operand")]
>   UNSPEC_PUSHFL))]
>   "GET_MODE_CLASS (GET_MODE (operands[1])) == MODE_CC"
>   "pushf{}"
>   [(set_attr "type" "push")
>(set_attr "mode" "")])

What does it even mean?  What is a flags:CC?  You always always always
need to say what is *in* the flags, if you want to use it as input
(which is what unspec does).  CC is weird like this.  Most targets do
not have distinct physical flags for every condition, only a few
conditions are "alive" at any point in the program!

> it is just a push of the flags reg to the stack. If the push can't be
> described in this way, then it is the middle end at fault, we can't
> just change modes at will.

But that is not what this describes: it operates on the flags register
in some unspecified way, and pushes the result of *that* to the stack.

(Stack pointer modification is not described here btw, should it be?  Is
that magically implemented by the backend some way, via type=push
perhaps?)

Segher

Re: Combine patch ping

2024-04-10 Thread Segher Boessenkool

On Sun, Apr 07, 2024 at 08:31:38AM +0200, Uros Bizjak wrote:
> If there are no further comments, I plan to commit the referred patch
> to the mainline on Wednesday. The latest version can be considered an
> obvious patch that solves certain oversight in the original
> implementation.

This is never okay.  You cannot commit a patch without approval, *ever*.

That patch is also obvious -- obviously *wrong*, that is.  There are
big assumptions everywhere in the compiler how a CC reg can be used.
This violates that, as explained elsewhere.

Segher

Re: [COMMITTED] testsuite/gcc.target/cris/pr93372-2.c: Handle xpass from combine improvement

2024-04-09 Thread Segher Boessenkool

Hi!

On Fri, Apr 05, 2024 at 04:06:01AM +0200, Hans-Peter Nilsson wrote:
> The xpassing change in generated code was as follows, at
> r14-9788-gb7bd2ec73d66f7 (where I locally applied a revert
> to verify that this suspect was the cause).  That was so
> much of an improvement that I had to share it!  Worth the
> testsuite churn anyway. :)
> 
> Segher, if you end up reverting r14-9692-g839bc42772ba7a (as
> unfortunately seems not unlikely), then please also revert this
> commit: r14-9799-g4c8b3600c4856f7915281ae3ff4d97271c83a540.

I won't revert it, it fixes an actual bug.  Not a regression no, but a
very serious longstanding problem.

We have accidentally done a limited version of a feature requested for
more than 20 years now, "UNCSE".  I'll do this for just combine (instead
of as a separate pass, lots of issues there with pass ordering, results
could be better though) in GCC 15.  This really is a stage 1 thing
though!

Any testcase that relies on something that combine does not promise and
that can not reasonably be expected to always hold is *buggy*.

The combine-2-2 testcase (that I wrote myself) isn't very good, and
should be replaced by something that is much more clearly a 2->2
combination, instead of 1->1 with context.

All (target-specific) new testsuite failures are just like that: bad
testcases!

So no, no reversion.

> That's the only test that's improved to the point of
> affecting test-patterns.  E.g. pr93372-5.c (which references
> pr93372-2.c) is also improved, though it retains a redundant
> compare insn.  (PR 93372 was about regressions from the cc0
> representation; not further improvement like here, thus it's
> not tagged.  Though, I did not double-check whether this
> actually *was* a regression from cc0.)

Interesting that this improved tests for you.  Huh.  Do you have an
explanation how this happened?  I suspect that as uaual it is just a
side effect of random factors: combine is opportunistic, always does the
first change it thinks good, not considering what this then does for
other possible combinations; it is greedy.  It would be nice to see
written out what happens in this example though :-)

Segher

Re: [PATCH] rtl-optimization/101523 - avoid re-combine after noop 2->2 combination

2024-04-05 Thread Segher Boessenkool

Hi!

On Wed, Apr 03, 2024 at 01:07:41PM +0200, Richard Biener wrote:
> The following avoids re-walking and re-combining the instructions
> between i2 and i3 when the pattern of i2 doesn't change.
> 
> Bootstrap and regtest running ontop of a reversal of 
> r14-9692-g839bc42772ba7a.

Please include that in the patch (or series, preferably).

> It brings down memory use frmo 9GB to 400MB and compile-time from
> 80s to 3.5s.  r14-9692-g839bc42772ba7a does better in both metrics
> but has shown code generation regressions across acrchitectures.
> 
> OK to revert r14-9692-g839bc42772ba7a?

No.

The patch solved a very real problem.  How does your replacement handle
that?  You don't say.  It looks like it only battles symptoms a bit,
instead :-(

We had this before: 3->2 combinations that leave an instruction
identical to what was there before.  This was just a combination with
context as well.  The only reason this wasn't a huge problem then
already was because this is a 3->2 combination, even if it really is a
2->1 one it still is beneficial in all the same cases.  But in the new
case it can iterate indefinitely -- well not quite, but some polynomial
number of times, for a polynomial at least of degree three, possibly
more :-(

With this patch you need to show combine still is linear.  I don't think
it is, but some deeper analysis might show it still is.

  ~ - ~ - ~

What should *really* be done is something that has been on the wish list
for decades: an uncse pass.

The things that combine no longer works on after my patch are actually
1->1 combinations (which we never do currently, although we probably
should); or alternatively, an un-CSE followed by a 2->1 combination.

We can do the latter of course, but we need to do an actual uncse first!
Somewhere before combine, and then redo a CSE after it.  An actual CSE,
not doing ten gazillion other things.

Segher

[PATCH] combine: Don't combine if I2 does not change

2024-03-27 Thread Segher Boessenkool

In some cases combine will "combine" an I2 and I3, but end up putting
exactly the same thing back as I2 as was there before.  This is never
progress, so we shouldn't do it, it will lead to oscillating behaviour
and the like.

If we want to canonicalise things, that's fine, but this is not the
way to do it.

Committed to trunk.


Segher


2024-03-27  Segher Boessenkool  

PR rtl-optimization/101523
* combine.cc (try_combine): Don't do a 2-insn combination if
it does not in fact change I2.
---
 gcc/combine.cc | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/gcc/combine.cc b/gcc/combine.cc
index a4479f8d8364..745391016d04 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -4186,6 +4186,17 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
rtx_insn *i0,
   adjust_for_new_dest (i3);
 }
 
+  /* If I2 didn't change, this is not a combination (but a simplification or
+ canonicalisation with context), which should not be done here.  Doing
+ it here explodes the algorithm.  Don't.  */
+  if (rtx_equal_p (newi2pat, PATTERN (i2)))
+{
+  if (dump_file)
+   fprintf (dump_file, "i2 didn't change, not doing this\n");
+  undo_all ();
+  return 0;
+}
+
   /* We now know that we can do this combination.  Merge the insns and
  update the status of registers and LOG_LINKS.  */
 
-- 
1.8.3.1

Re: [PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]

2024-03-18 Thread Segher Boessenkool

On Thu, Mar 07, 2024 at 11:46:54PM +0100, Uros Bizjak wrote:
> > Can't you just describe the dataflow then, without an unspec?  An unspec
> > by definition does some (unspecified) operation on the data.
> 
> Previously, it was defined as:
> 
>  (define_insn "*pushfl2"
>[(set (match_operand:W 0 "push_operand" "=<")
>  (match_operand:W 1 "flags_reg_operand"))]
> 
> But Wmode, AKA SI/DImode is not CCmode. And as said in my last
> message, nothing prevents current source code to try to update the CC
> register here.

So you can use an unspec just to convert the flags reg to an integer?
To make an integer representation of flags reg contents.

Or is that what we started with here?


Segher

Re: [PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]

2024-03-18 Thread Segher Boessenkool

On Thu, Mar 07, 2024 at 11:27:28PM +0100, Uros Bizjak wrote:
> On Thu, Mar 7, 2024 at 11:07 PM Uros Bizjak  wrote:
> > > >  (unspec:DI [
> > > >  (reg:CC 17 flags)
> > > >  ] UNSPEC_PUSHFL)
> > >
> > > But that is invalid RTL?  The only valid use of a CC is written as
> > > cc-compared-to-0.  It means "the result of something compared to 0,
> > > stored in that cc reg".
> > >
> > > (And you can copy a CC reg around, but that is not a use ;-) )
> 
> Hm... under this premise, we can also say that any form that is not
> cc-compared-to-0 is not a use.

Well, no.  All uses of CC are written as comparisons to 0, or are pure
dataflow.  Anything else is not "not a use" but just invalid.

> Consequently, if it is not a use, then
> the CC reg should not be updated at its use location, so my v1 patch,
> where we simply skip the update (but retain the combined insn),
> actually makes sense.

With more asserts, perhaps.

> In this concrete situation, we don't care about CC register mode in
> the PUSHFL insn. And we should not change CC reg mode of the use,
> because any other mode than the generic CCmode won't be recognized by
> the insn pattern.

You always care about the CC mode, that is why you always write it as
comparison, so the backend can choose a mode based on what the flag bits
mean in this context.  For an extreme example look at 390, but on pretty
much any target both signed and unsigned comparisons use the same flag
bits, and maybe fp comparisons as well.

But pushfl does sound like pure dataflow.  Why is this a builtin, is
it ever a good idea if the user writes stuff the compiler can do a
better job doing itself, not to mention it is way easier for the
compiler anyway?  :-)

Segher

Re: [PATCH] rs6000: Fix issue in specifying PTImode as an attribute [PR106895]

2024-03-18 Thread Segher Boessenkool

Hi!

On Fri, Feb 23, 2024 at 03:04:13PM +0530, jeevitha wrote:
> PTImode attribute assists in generating even/odd register pairs on 128 bits.

It is a mode, not an attribute.  Attributes are on declarations, while
modes are on a much more fundamental level (every value has a mode, in
GCC!)

> When the user specifies PTImode as an attribute, it breaks because there is no
> internal type to handle this mode . We have created a tree node with dummy 
> type
> to handle PTImode.

You are talking about the mode attribute.  Aha.

This attribute says the mode of the datum is something specific; it is
the only way a user can specify the mode directly.  Not something you
want to use normally, but it's a nice escape hatch (like here).

> We are not documenting this dummy type since users are not
> allowed to use this type externally.

Not sure what this is meant to mean?  What does "allowed to" mean, even?
We do not forbid people from doing anything (we can discourage them
though).  Or dso you mean something just doesn't work?

> gcc/
>   PR target/106895
>   * config/rs6000/rs6000.h (enum rs6000_builtin_type_index): Add fields
>   to hold PTImode type.

An enum does not have fields.  What do you actually mean?

> --- a/gcc/config/rs6000/rs6000.h
> +++ b/gcc/config/rs6000/rs6000.h
> @@ -2304,6 +2304,7 @@ enum rs6000_builtin_type_index
>RS6000_BTI_ptr_vector_quad,
>RS6000_BTI_ptr_long_long,
>RS6000_BTI_ptr_long_long_unsigned,
> +  RS6000_BTI_PTI,

Please call it RS6000_BTI_INTPTI, to be more in line with the naming of
other things.

With that fixed it should be good.  Please repost with a good commit
comment and changelog :-)

Thanks,

Segher

Re: [PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]

2024-03-07 Thread Segher Boessenkool

On Thu, Mar 07, 2024 at 11:07:18PM +0100, Uros Bizjak wrote:
> On Thu, Mar 7, 2024 at 10:37 PM Segher Boessenkool
>  wrote:
> > > but can be something else, such as the above noted
> > >
> > >  (unspec:DI [
> > >  (reg:CC 17 flags)
> > >  ] UNSPEC_PUSHFL)
> >
> > But that is invalid RTL?  The only valid use of a CC is written as
> > cc-compared-to-0.  It means "the result of something compared to 0,
> > stored in that cc reg".
> >
> > (And you can copy a CC reg around, but that is not a use ;-) )
> 
> How can we describe a pushfl then?

(unspec:DI [
(compare:CC) ((reg:CC 17 flags) (const_int 0))
] UNSPEC_PUSHFL)

or something like that?

> It was changed to its current form
> in [1], but I suspect that the code will try to update it even when
> pushfl is implemented as a direct move from a register (as was defined
> previously).
> 
> BTW: Unspecs are used in a couple of other places for other targets [2].
> 
> [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112494#c5
> [2] https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639743.html

There is nothing wront with unspecs.  You cannot use a CCmode value
without comparing it to 0, though.  The exact kind of comparison
determines what bits are valid (and have what meaning) in your CC reg,
even!

> > > The source code that deals with the *user* of the CC register assumes
> > > the former form, so it blindly tries to update the mode of the CC
> > > register inside LT comparison RTX
> >
> > Would you like it better if there was an assert for this?  There are
> > very many RTL requirements that aren't chacked for currently :-/
> 
> In this case - yes. Assert signals that something is unsupported (or
> invalid), way better than silently corrupting some memory, reporting
> the corruption only with checking enabled.

Yeah.  The way RTL checking works makes this hard to do in most cases.
Hrm.  (It cannot easily look at context, only inside of the current RTX).

> > The unspec should have the CC compared with 0 as argument.
> 
> But this does not do what pushfl does... It pushes the register to the stack.

Can't you just describe the dataflow then, without an unspec?  An unspec
by definition does some (unspecified) operation on the data.


Segher

Re: [PATCH V3 2/2] rs6000: Load store fusion for rs6000 target using common infrastructure

2024-03-07 Thread Segher Boessenkool

On Fri, Mar 08, 2024 at 03:01:04AM +0530, Ajit Agarwal wrote:
>  
> >> +   Copyright (C) 2020-2023 Free Software Foundation, Inc.
> > 
> > What in here is from 2020?
> > 
> > Most things will be from 2024, too.  First publication date is what
> > counts.
> 
> Please let me know the second publication date.

Huh?  This code won't be published before 2024 (and it does not derive
from older code), so the only valid date in the copyright message is
2024.

> >> +  bool pair_check_register_operand (bool load_p, rtx reg_op,
> >> +  machine_mode mem_mode)
> >> +  {
> >> +if (load_p || reg_op || mem_mode)
> >> +  return false;
> >> +else
> >> +  return false;
> >> +  }
> > 
> > The compiler will have warned for this.  Please look at all compiler
> > (and other) warnings that you introduce.
> >
> 
> As far as my understanding I didn't see any extra warnings, 
> but I will surely cross check and solve that.

Hrm, apparently there is no -Wall -W warnign for this?  But your code is
essentially

bool pair_check_register_operand (bool, rtx, machine_mode)
{
  return false;
}

> >> +// alias_walker that iterates over stores.
> >> +template
> >> +class store_walker : public def_walker
> > 
> > That is not a good comment.  You should describe parameters and return
> > values and that kind of thing.  That it walks over things is bloody
> > obvious from the name already :-)
> >
> 
> This part of code is taken from aarch64 load store fusion
> pass.  I have made the aarch64-ldp-fusion.cc into target independent code and 
> target dependent code. Target independent code is shared
> across all the architecture, In this case its rs6000 and aarch64.
> Target dependent code is implemented through pure virtual functions.

It is required to decribe what a function is for, and all its arguments
and return values.  If the aarch64 code doesn't, it should be fixed.

Not only reviewers need this, anyone trying to use the code needs it,
too.

> >> +find_trailing_add (insn_info *insns[2],
> >> + const insn_range_info _range,
> >> + int initial_writeback,
> >> + rtx *writeback_effect,
> >> + def_info **add_def,
> >> + def_info *base_def,
> >> + poly_int64 initial_offset,
> >> + unsigned access_size);
> > 
> > That is way, way, way too many parameters.
> > 
> 
> This code I have taken from aarch64-ldp-fusion.cc.
> I have not changed anything here.

Don't copy not-so-good stuff unmodified?  It is unreviewable, to start
with, but probably not very usable later either.

> Could you please elaborate on how you want me
> to structure the patches.

*You* should know the code already, so you surely can figure out a nice
way to present it, so that it takes me LESS work to review this than it
took you to write it?

Making a patch series reviewable is part of the development effort.  It
is way less work if you start with this as the goal in mind.  It is less
work than writing (and debugging etc.) an omnibus patch, in the first
place!

Your goal is to make a patch series that will be reviewed and then seen
to be great stuff.  So it of course needs to *be* great stuff, but it
also needs to be presented in such a way that that is obvious.

Segher

Re: [PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]

2024-03-07 Thread Segher Boessenkool

On Thu, Mar 07, 2024 at 10:04:32PM +0100, Uros Bizjak wrote:

[snip]

> The part we want to fix deals with the *user* of the CC register. It
> is not true that this is always COMPARISON_P, so EQ, NE, GE, LT, ...
> in the form of
> 
> (LT:CCGC (reg:CCGC 17 flags) (const_int 0))
> 
> but can be something else, such as the above noted
> 
>  (unspec:DI [
>  (reg:CC 17 flags)
>  ] UNSPEC_PUSHFL)

But that is invalid RTL?  The only valid use of a CC is written as
cc-compared-to-0.  It means "the result of something compared to 0,
stored in that cc reg".

(And you can copy a CC reg around, but that is not a use ;-) )

> The source code that deals with the *user* of the CC register assumes
> the former form, so it blindly tries to update the mode of the CC
> register inside LT comparison RTX

Would you like it better if there was an assert for this?  There are
very many RTL requirements that aren't chacked for currently :-/

> (some other nearby source code even
> checks for (const_int 0) RTX). Obviously, this is not the case with
> the former form, where the update tries to:
> 
> SUBST (XEXP (*cc_use_loc, 0), ...)
> 
> on unspec, which has no XEXP (..., 0).
> 
> And *this* is what triggers RTX checking assert.

The unspec should have the CC compared with 0 as argument.


Segher

Re: [PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]

2024-03-07 Thread Segher Boessenkool

On Thu, Mar 07, 2024 at 12:22:04PM +0100, Uros Bizjak wrote:
> As I understood find_single_use, it is returning RTX iff DEST is used
> only a single time in an insn sequence following INSN.

Connected by a log_link even, yeah.

> We can reject the combination without worries of multiple uses.

Yup.  That is the whole point of find_single_use: if that test fails,
combine knows to get its paws off :-)

Segher

Re: [PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]

2024-03-07 Thread Segher Boessenkool

On Thu, Mar 07, 2024 at 11:45:45AM +0100, Richard Biener wrote:
> The question is, whether a NULL cc_use_loc (find_single_use returning 
> NULL) means "there is no use" or it can mean "huh, don't know, maybe
> more than one, maybe I was too stupid to indentify the single use".
> The implementation suggests it's all broken ;)

It specifically means there is not a *single* use (or it could not find
what it was, perhaps).  It does not mean there is no use.  There is not
much in combine that cares about dead code anyway, earier passes should
have taken care of that ;-)

All as documented.

Segher

Re: [PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]

2024-03-07 Thread Segher Boessenkool

On Thu, Mar 07, 2024 at 11:22:12AM +0100, Richard Biener wrote:
> > > > Undo the combination if *cc_use_loc is not COMPARISON_P.

Why, anyway?  COMPARISON_P means things like LE.  It does not even
include actual RTX COMPARE.


Segher

Re: [PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]

2024-03-07 Thread Segher Boessenkool

On Thu, Mar 07, 2024 at 10:55:12AM +0100, Richard Biener wrote:
> On Thu, 7 Mar 2024, Uros Bizjak wrote:
> > This is
> > 
> > 3236  /* Just replace the CC reg with a new mode.  */
> > 3237  SUBST (XEXP (*cc_use_loc, 0), newpat_dest);
> > 3238  undobuf.other_insn = cc_use_insn;
> > 
> > in combine.cc, where *cc_use_loc is
> > 
> > (unspec:DI [
> > (reg:CC 17 flags)
> > ] UNSPEC_PUSHFL)
> > 
> > combine assumes CC must be used inside of a comparison and uses XEXP (..., 
> > 0)

No.  It has established *this is the case* some time earlier.  Lines\
3155 and on, what begins with
  /* Many machines have insns that can both perform an
 arithmetic operation and set the condition code.

> > OK for trunk?
> 
> Since you CCed me - looking at the code I wonder why we fatally fail.

I did not get this email btw.  Some blip in email (on the sender's side)
I guess?

> The following might also fix the issue and preserve more of the
> rest of the flow of the function.

> --- a/gcc/combine.cc
> +++ b/gcc/combine.cc
> @@ -3182,7 +3182,8 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn 
> *i1, rtx_insn *i0,
>  
>if (undobuf.other_insn == 0
>   && (cc_use_loc = find_single_use (SET_DEST (newpat), i3,
> -   _use_insn)))
> +   _use_insn))
> + && COMPARISON_P (*cc_use_loc))

Line 3167 already is
  && GET_CODE (SET_SRC (PATTERN (i3))) == COMPARE
so what in your backend is unusual?


Segher

Re: [PATCH V3 2/2] rs6000: Load store fusion for rs6000 target using common infrastructure

2024-02-29 Thread Segher Boessenkool

Hi!

On Mon, Feb 19, 2024 at 04:24:37PM +0530, Ajit Agarwal wrote:
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -518,7 +518,7 @@ or1k*-*-*)
>   ;;
>  powerpc*-*-*)
>   cpu_type=rs6000
> - extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o"
> + extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o 
> rs6000-vecload-fusion.o"

Line too long.

> +  /* Pass to replace adjacent memory addresses lxv instruction with lxvp
> + instruction.  */
> +  INSERT_PASS_BEFORE (pass_early_remat, 1, pass_analyze_vecload);

That is not such a great name.  Any pss name with "analyze" is not so
good -- the pass does much more than just "analyze" things!

> --- /dev/null
> +++ b/gcc/config/rs6000/rs6000-vecload-fusion.cc
> @@ -0,0 +1,701 @@
> +/* Subroutines used to replace lxv with lxvp
> +   for TARGET_POWER10 and TARGET_VSX,

The pass filename is not good then, either.

> +   Copyright (C) 2020-2023 Free Software Foundation, Inc.

What in here is from 2020?

Most things will be from 2024, too.  First publication date is what
counts.

> +   Contributed by Ajit Kumar Agarwal .

We don't say such things in the files normally.

> +class rs6000_pair_fusion : public pair_fusion
> +{
> +public:
> +  rs6000_pair_fusion (bb_info *bb) : pair_fusion (bb) {reg_ops = NULL;};
> +  bool is_fpsimd_op_p (rtx reg_op, machine_mode mem_mode, bool load_p);
> +  bool pair_mem_ok_policy (rtx first_mem, bool load_p, machine_mode mode)
> +  {
> +return !(first_mem || load_p || mode);
> +  }

It is much more natural to write this as
  retuurn !first_mem && !load && !mode;

(_p is wrong, this is not a predicate, it is not a function at all!)

What is "!mode" for here?  How can VOIDmode happen here?  What does it
mean?  This needs to be documented.

> +  bool pair_check_register_operand (bool load_p, rtx reg_op,
> + machine_mode mem_mode)
> +  {
> +if (load_p || reg_op || mem_mode)
> +  return false;
> +else
> +  return false;
> +  }

The compiler will have warned for this.  Please look at all compiler
(and other) warnings that you introduce.

> +rs6000_pair_fusion::is_fpsimd_op_p (rtx reg_op, machine_mode mem_mode, bool 
> load_p)
> +{
> +  return !((reg_op && mem_mode) || load_p);
> +}

For more complex logic, split it up into two or more conditional
returns.

> +// alias_walker that iterates over stores.
> +template
> +class store_walker : public def_walker

That is not a good comment.  You should describe parameters and return
values and that kind of thing.  That it walks over things is bloody
obvious from the name already :-)

> +extern insn_info *
> +find_trailing_add (insn_info *insns[2],
> +const insn_range_info _range,
> +int initial_writeback,
> +rtx *writeback_effect,
> +def_info **add_def,
> +def_info *base_def,
> +poly_int64 initial_offset,
> +unsigned access_size);

That is way, way, way too many parameters.

So:

* Better names please.
* Better documentation, too, including documentations in the code.
Don't describe *what*, anyone can see that anyway, but describe *why*.
* This is way too much for one patch.  Split this into many patches,
properly structured in a patch series.  The design will need some
explanation, but none of the code should need that, ever!

Segher

Re: [PATCH V2] rs6000: Don't allow immediate value in the vsx_splat pattern [PR113950]

2024-02-28 Thread Segher Boessenkool

On Wed, Feb 28, 2024 at 11:58:15AM -0600, Peter Bergner wrote:
> On 2/28/24 8:31 AM, Segher Boessenkool wrote:
> > On Tue, Feb 27, 2024 at 04:50:02PM -0600, Peter Bergner wrote:
> >> So it seems you're not NAKing the use of splat_input_operand, but
> >> just that it needs more explanation in the git log entry, correct?
> > 
> > I NAK the patch.  _Of course_ there needs to be *something* done, there
> > is a bug after all, it needs to be fixed.
> > 
> > But no, there are big questions about if splat_input_operand is correct
> > as well.  This needs to be justified in the patch submission.
> 
> Ok, then Jeevitha, repost the patch with the s/op1/operands[1]/ only change.
> Jeevitha has already bootstrapped and regtested that change and it does
> fix the bug.
> 
> Clearly, the splat_input_operand change needs more discussion and would
> be a follow-on patch...if we decide to do it at all.

It is clear that input_operand is wrong.  It isn't clear that
splat_input_operand is correct though :-(


Segher

Re: [PATCH V2] rs6000: Don't allow immediate value in the vsx_splat pattern [PR113950]

2024-02-28 Thread Segher Boessenkool

On Tue, Feb 27, 2024 at 04:50:02PM -0600, Peter Bergner wrote:
> On 2/27/24 6:40 AM, Segher Boessenkool wrote:
> > On Tue, Feb 27, 2024 at 02:02:38AM +0530, jeevitha wrote:
> > input_operand allows a lot of things that splat_input_operand does not,
> > not just immediate operands.  NAK.
> > 
> > (For example, *all* memory is okay for input_operand, always).
> > 
> > I'm not saying we do not want to restrict these things, but a commit
> > that doesn't discuss this at all is not okay.  Sorry.
> 
> So it seems you're not NAKing the use of splat_input_operand, but
> just that it needs more explanation in the git log entry, correct?

I NAK the patch.  _Of course_ there needs to be *something* done, there
is a bug after all, it needs to be fixed.

But no, there are big questions about if splat_input_operand is correct
as well.  This needs to be justified in the patch submission.

> Yes, input_operand accepts a lot more things than splat_input_operand
> does, but the multiple define_insns this define_expand feeds, uses
> gpc_reg_operand, memory_operand and splat_input_operand for their
> operands[1] operand (splat_input_operand accepts reg and mem too),
> so it seems to match better what the patterns will be accepting and
> I always thought that using predicates that more accurately reflect
> what the define_insns expect/accept lead to better code gen.

Still, it needs explanation why we allowed it before, but that was a
mistake, or for some reason we do not need it.

Sell your patch!  :-)

> Mike, was it just an oversight to not use splat_input_operand for the
> vsx_splat_ expander or was input_operand a conscious decision?
> 
> If input_operand was used purposely, then we can just fall back to
> the s/op1/operands[1]/ change which we already know fixes the bug.

input_operand allows all inputs for mov insns.  It isn't suitable
for any other instructions.


Segher

Re: [PATCH V2] rs6000: Don't allow immediate value in the vsx_splat pattern [PR113950]

2024-02-27 Thread Segher Boessenkool

Hi!

On Tue, Feb 27, 2024 at 02:02:38AM +0530, jeevitha wrote:
> There is no immediate value splatting instruction in Power. Currently, those
> values need to be stored in a register or memory. To address this issue, I
> have updated the predicate for the second operand in vsx_splat to
> splat_input_operand and corrected the assignment of op1 to operands[1].
> These changes ensure that operand1 is stored in a register.

input_operand allows a lot of things that splat_input_operand does not,
not just immediate operands.  NAK.

(For example, *all* memory is okay for input_operand, always).

I'm not saying we do not want to restrict these things, but a commit
that doesn't discuss this at all is not okay.  Sorry.

Segher

Re: Repost [PATCH 1/6] Add -mcpu=future

2024-02-23 Thread Segher Boessenkool

On Tue, Feb 20, 2024 at 06:35:34PM +0800, Kewen.Lin wrote:
> on 2024/2/8 03:58, Michael Meissner wrote:
> $ grep -r "define PROCESSOR_DEFAULT" gcc/config/rs6000/
> gcc/config/rs6000/aix71.h:#define PROCESSOR_DEFAULT PROCESSOR_POWER7
> gcc/config/rs6000/aix71.h:#define PROCESSOR_DEFAULT64 PROCESSOR_POWER7
> gcc/config/rs6000/aix72.h:#define PROCESSOR_DEFAULT PROCESSOR_POWER7
> gcc/config/rs6000/aix72.h:#define PROCESSOR_DEFAULT64 PROCESSOR_POWER7
> gcc/config/rs6000/aix73.h:#define PROCESSOR_DEFAULT PROCESSOR_POWER8
> gcc/config/rs6000/aix73.h:#define PROCESSOR_DEFAULT64 PROCESSOR_POWER8
> gcc/config/rs6000/darwin.h:#define PROCESSOR_DEFAULT  PROCESSOR_PPC7400
> gcc/config/rs6000/darwin.h:#define PROCESSOR_DEFAULT64  PROCESSOR_POWER4
> gcc/config/rs6000/freebsd64.h:#define PROCESSOR_DEFAULT PROCESSOR_PPC7450
> gcc/config/rs6000/freebsd64.h:#define PROCESSOR_DEFAULT64 PROCESSOR_POWER8
> gcc/config/rs6000/linux64.h:#define PROCESSOR_DEFAULT PROCESSOR_POWER7
> gcc/config/rs6000/linux64.h:#define PROCESSOR_DEFAULT64 PROCESSOR_POWER8
> gcc/config/rs6000/rs6000.h:#define PROCESSOR_DEFAULT   PROCESSOR_PPC603
> gcc/config/rs6000/rs6000.h:#define PROCESSOR_DEFAULT64 PROCESSOR_RS64A
> gcc/config/rs6000/vxworks.h:#define PROCESSOR_DEFAULT PROCESSOR_PPC604
> 
> , and they are unlikely to be updated later, no?

In most cases did would be an ABI change.  Almost never an acceptable
thing to do.


Segher

Re: [PATCH 0/2 V2] aarch64: Place target independent and dependent code in one file.

2024-02-22 Thread Segher Boessenkool

On Thu, Feb 22, 2024 at 07:49:20PM +, Richard Sandiford wrote:
> Thanks for the update.  This is still quite hard to review though.
> Sorry to ask for another round, but could you split it up further?
> The ideal thing would be if patches that move code do nothing other
> than move code, and if patches that change code do those changes
> in-place.

In general, if there is a (big) part to the patch that does not change
behaviour at all, it should be a separate patch.  Such a patch is then
easy to review (write down in the commit message that it does not change
behaviour though, it helps reviewers).

It also makes the remaining tiny patches much easier to review then.

Very generally, any patch that makes interesting changes should not
have more than a few lines semantic content.  That can be repeated of
course, and have fall-out mechanical follow-up changes, but that's the
essence of good patchsets: one change per patch.

And then the commit message can be simple as well, and the chanegelog
will be easy to write.  That is the litmus test for good patch series :-)

Segher

Re: [PATCH] rs6000: Update instruction counts due to combine changes [PR112103]

2024-02-20 Thread Segher Boessenkool

On Tue, Feb 20, 2024 at 01:49:30PM -0600, Peter Bergner wrote:
> I think this will become less fragile after we fix PR114004 which is

You call it "fragile".  I call it the testcase found the exact kind of
bug this testcase was meant to find!

Yes, the test should become quieter when the compiler has fewer bugs :-)


Segher

Re: [PATCH] rs6000: Neuter option -mpower{8,9}-vector [PR109987]

2024-02-20 Thread Segher Boessenkool

On Tue, Feb 20, 2024 at 05:27:07PM +0800, Kewen.Lin wrote:
> > -mcpu=power8 implies -mvsx already.
> 
> Yes, but users can specify -mno-vsx in RUNTESTFLAGS, dejagnu
> framework can have different behaviors (options order) for
> different versions, this explicit -mvsx is mainly for the
> consistency between the checking and the actual testing.

It is not supported at all.  It is better to assume users do not try
to hang themselves.

> > It is mostly a testsuite patch, and testcase patches are fine (and much
> > wanted!) in stage 4.  The actual compiler options remain, and behaviour
> > does not change for anyone who used the option as intended,
> 
> Yes, excepting for one unexpected use that users having one cpu type which
> doesn't support power8/power9 capability but meanwhile specifies option
> -mpower{8,9}-vector to gain power8/power9 capability (as currently these
> options can enable the corresponding flags).  But I don't think it's an
> expected use case.

Yeah, it is why we do not want such options at all :-)

> >>* config/rs6000/rs6000.opt: Make option power{8,9}-vector as
> >>WarnRemoved.
> > 
> > Do we want this, or do we want it silent?  Should we remove the options
> > later, if we now warn for it?
> 
> Good question, it mainly follows the practice of option direct-move here.
> IMHO at least for power8-vector we want WarnRemoved for now as it's
> documented before, and we can probably make it (or them) removed later on
> trunk once all active branch releases don't support it any more.
> 
> What's your opinion on this?

Originally I did
  Warn(%qs is deprecated)
which already was a mistake.  It then changed to
  Deprecated
and then to
  WarnRemoved
which make it clearer that it is a bad plan.

If it is okay to remove an option, we should not talk about it at all
anymore.  Well maybe warn about it for another release or so, but not
longer.

> >>  (define_register_constraint "we" 
> >> "rs6000_constraints[RS6000_CONSTRAINT_we]"
> >> -  "@internal Like @code{wa}, if @option{-mpower9-vector} and 
> >> @option{-m64} are
> >> -   used; otherwise, @code{NO_REGS}.")
> >> +  "@internal Like @code{wa}, if the cpu type is power9 or up, meanwhile
> >> +   @option{-mvsx} and @option{-m64} are used; otherwise, @code{NO_REGS}.")
> > 
> > "if this is a POWER9 or later and @option{-mvsx} and @option{-m64} are
> > used".  How clumsy.  Maybe we should make the patterns that use "we"
> > work without mtvsrdd as well?  Hrm, they will still require 64-bit GPRs
> > of course, unless we can do something tricky.
> > 
> > We do not need the special constraint at all of course (we can add these
> > conditions to all patterns that use it: all *two* patterns).  So maybe
> > that's what we should do :-)
> 
> Not sure the original intention introducing it (Mike might know it best), but
> removing it sounds doable.

It is for mtvsrdd.

>  btw, it seems more than two patterns using it?
> like (if I didn't miss something):
>   - vsx_concat_
>   - vsx_splat__reg
>   - vsx_splat_v4si_di
>   - vsx_mov_64bit

Yes, it isn't clear we should use this contraint in those last two.  It
looks like those do not even need the restriction to 64 bit systems.
Well the last one obviously has that already, but then it could just use
"wa", no?

> > -mcpu=power8 implies -mvsx (power7 already).  You can disable VSX, or
> > VMX as well, but by default it is enabled.
> 
> Yes, it's meant to consider the explicitly -mno-vsx, which suffers the option
> order issue.  But considering we raise error for -mno-vsx -mpower{8,9}-vector
> before, without specifying -mvsx is closer to the previous.
> 
> I'll adjust it and the below similar ones, thanks!

It is never supported to do unsupported things :-)

We need to be able to rely on defaults.  Otherwise, we will have to
implement all of GCC recursively, in itself, in the testsuite, and in
individual tests.  Let's not :-)

Cheers,


Segher

Re: [PATCH] rs6000: Neuter option -mpower{8,9}-vector [PR109987]

2024-02-19 Thread Segher Boessenkool

Hi!

On Tue, Jan 16, 2024 at 10:50:01AM +0800, Kewen.Lin wrote:
> As PR109987 and its duplicated bugs show, -mno-power8-vector
> (and -mno-power9-vector) cause some problems and as Segher
> pointed out in [1] they are workaround options, so this patch
> is to remove -m{no,}-power{8,9}-options.

Excellent :-)

> Like what we did
> for option -mdirect-move before, this patch still keep the
> corresponding internal flags and they are automatically set
> based on -mcpu.

Yup.  That makes the code nicer, and it what we already have anyway!

> The test suite update takes some efforts,

Yeah :-/

> it consists of some aspects:
>   - effective target powerpc_p{8,9}vector_ok are removed
> and replaced with powerpc_vsx_ok.

So all such testcases already arrange to have p8 or p9 some other way?

>   - Some cases having -mpower{8,9}-vector are updated with
> -mvsx, some of them already have -mdejagnu-cpu.  For
> those that don't have -mdejagnu-cpu, if -mdejagnu-cpu
> is needed for the test point, then it's appended;
> otherwise, add additional-options -mdejagnu-cpu=power{8,9}
> if has_arch_pwr{8,9} isn't satisfied.

Yeah it's a judgement call every time.

>   - Some test cases are updated with explicit -mvsx.
>   - Some test cases with those two option mixed are adjusted
> to keep the test points, like -mpower8-vector
> -mno-power9-vector are updated with -mdejagnu-cpu=power8
> -mvsx etc.

-mcpu=power8 implies -mvsx already.

>   - Some test cases with -mno-power{8,9}-vector are updated
> by replacing -mno-power{8,9}-vector with -mno-vsx, or
> just removing it.

Okay.

>   - For some cases, we don't always specify -mdejagnu-cpu to
> avoid to restrict the testing coverage, it would check
> has_arch_pwr{8,9} and appended that as need.

That is in general how all tests should be.  Very sometimes we want to
test for a specific CPU, for a regression test that exhibited just on a
certain CPU for example.  But we should never have a -mcpu= (or a
-mpowerN-vector nastiness thing) to test things on a new CPU!  Just do a
testsuite ruyn with *that* CPU.  Not many years from now, *all* CPUs
will have those new instructions anyway, so let's not put noise in the
testcases that will be irrelevant soon.

>   - For vect test cases run, it doesn't specify -mcpu=power9
> for power10 and up.
> 
> Bootstrapped and regtested on:
>   - powerpc64-linux-gnu P7/P8/P9 {-m32,-m64}
>   - powerpc64le-linux-gnu P8/P9/P10

In general it is nice to test 970 as the lowest vector thing we have,
abnd/or p4 as a target without anything vector, as well.  But I expect
thoise will just work for this patch :-)

> Although it's stage4 now, as the discussion in PR113115 we
> are still eager to neuter these two options.

It is mostly a testsuite patch, and testcase patches are fine (and much
wanted!) in stage 4.  The actual compiler options remain, and behaviour
does not change for anyone who used the option as intended,

Okay for trunk.  Thanks!  Comments below:

>   * config/rs6000/rs6000.opt: Make option power{8,9}-vector as
>   WarnRemoved.

Do we want this, or do we want it silent?  Should we remove the options
later, if we now warn for it?

>  (define_register_constraint "we" "rs6000_constraints[RS6000_CONSTRAINT_we]"
> -  "@internal Like @code{wa}, if @option{-mpower9-vector} and @option{-m64} 
> are
> -   used; otherwise, @code{NO_REGS}.")
> +  "@internal Like @code{wa}, if the cpu type is power9 or up, meanwhile
> +   @option{-mvsx} and @option{-m64} are used; otherwise, @code{NO_REGS}.")

"if this is a POWER9 or later and @option{-mvsx} and @option{-m64} are
used".  How clumsy.  Maybe we should make the patterns that use "we"
work without mtvsrdd as well?  Hrm, they will still require 64-bit GPRs
of course, unless we can do something tricky.

We do not need the special constraint at all of course (we can add these
conditions to all patterns that use it: all *two* patterns).  So maybe
that's what we should do :-)

> -If you use the ISA 3.0 instruction set (@option{-mpower9-vector} or
> -@option{-mcpu=power9}) on a 64-bit system, the IEEE 128-bit floating
> -point support will also enable the generation of ISA 3.0 IEEE 128-bit
> -floating point instructions.  Otherwise, if you do not specify to
> -generate ISA 3.0 instructions or you are targeting a 32-bit big endian
> -system, IEEE 128-bit floating point will be done with software
> -emulation.
> +If you use the ISA 3.0 instruction set (@option{-mcpu=power9}) on a
> +64-bit system, the IEEE 128-bit floating point support will also enable
> +the generation of ISA 3.0 IEEE 128-bit floating point instructions.
> +Otherwise, if you do not specify to generate ISA 3.0 instructions or you
> +are targeting a 32-bit big endian system, IEEE 128-bit floating point
> +will be done with software emulation.

You do not need to reformat documentation source code: it is
automatically formatted in all output formats and all viewers :-)

> diff --git

Re: [PATCH] Turn on LRA on all targets

2024-02-16 Thread Segher Boessenkool

On Fri, Feb 16, 2024 at 11:34:55AM +, Maciej W. Rozycki wrote:
>  Not really, in particular because EH unwinding has to be reliable and 
> heuristics inherently is not.

Yup.  Which is why I did 0359465c703a for rs6000 six years ago (how time
flies!)  The commit message for that includes

To find out where on-entry register values live at any point in a
program, GDB currently tries to parse to parse the executable code.
This does not work very well, for example it gets confused if some
accesses to the stack use the frame pointer (r31) and some use the
stack pointer (r1).  A symptom is that backtraces can be cut short.

and the patch does

+  /* By default, always emit DWARF-2 unwind info.  This allows debugging
+ without maintaining a stack frame back-chain.  It also allows the
+ debugger to find out where on-entry register values are stored at any
+ point in a function, without having to analyze the executable code (which
+ isn't even possible to do in the general case).  */
+#ifdef OBJECT_FORMAT_ELF
+  opts->x_flag_asynchronous_unwind_tables = 1;
+#endif

We went through very many refinements of the heuristics through the
years, but at some point you just have to give up: heuristics never
can make up for missing information.

>  Consequently the more aggressive the compiler has become to schedule
> function body instructions within a function's prologue the more lost the 
> machine code interpreter has become.  Ultimately it would have to become a 
> full-fledged CPU simulator to do its heuristics.

Yup, exactly.

> In reality it means the 
> unwinder may fail to produce acceptable results, which will happen at any 
> frequency between hardly ever to most often, depending on the exact 
> circumstances.

Yes.  If the compiler optimises the *logue code well, there is no way
heuristics can follow that.

>  Conversely no heuristics is required to unwind VAX frames, because they 
> are fixed in layout by hardware, fully self-described, and with the 
> hardware frame pointer always available.

The downside of the VAX situation of course is that the compiler has no
freedom to optimise the frame and *logue code at all, let alone well.
This may not matter so much on narrow ucoded in-order machines, there
are different balances there :-)

Segher

Re: [PATCH] Turn on LRA on all targets

2024-02-16 Thread Segher Boessenkool

On Thu, Feb 15, 2024 at 08:41:42PM -0500, Paul Koning wrote:
> > On Feb 15, 2024, at 5:56 PM, Segher Boessenkool 
> >  wrote:
> > 
> > On Thu, Feb 15, 2024 at 07:34:32PM +, Sam James wrote:
> >> I have now started doing this in PR113932.
> > 
> > Thank you!
> > 
> > Segher
> 
> Presumably this isn't for version 14 since it's in a late stage, right?  I 
> have my bits about ready to go in but I'll wait for State 1 to open.  Correct?

Absolutely.  It was decided early in stage 1 that this wasn't going to
happen for 14.

It appears most of the anywhere near hard targets have not done anything
though.  I'll just push very hard for 15.  But you will be fine :-)

Segher

Re: [PATCH] Turn on LRA on all targets

2024-02-15 Thread Segher Boessenkool

On Thu, Feb 15, 2024 at 07:34:32PM +, Sam James wrote:
> I have now started doing this in PR113932.

Thank you!


Segher

Re: Repost [PATCH 1/6] Add -mcpu=future

2024-02-08 Thread Segher Boessenkool

On Fri, Jan 05, 2024 at 06:35:37PM -0500, Michael Meissner wrote:
>   * config/rs6000/rs6000.opt (-mfuture): New undocumented debug switch.

No.  Never ever use a flag that does what -mcpu= should do.  We're
still trying to recover from previous such mistakes.  Don't add more
please.

> +++ b/gcc/config/rs6000/rs6000-c.cc
> @@ -447,6 +447,8 @@ rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT 
> flags)
>  rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR9");
>if ((flags & OPTION_MASK_POWER10) != 0)
>  rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR10");
> +  if ((flags & OPTION_MASK_FUTURE) != 0)
> +rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR_FUTURE");

if a & B) != 0) != 0) != 0)  ?  You can do just
if (a & B)

Yes, existing code already does the silly thing, but just fix it then,
don't add more :-)

(And no   if ((a & B))   either please).

> +static int
> +rs600_cpu_index_lookup (enum processor_type processor)
> +{
> +  for (size_t i = 0; i < ARRAY_SIZE (processor_target_table); i++)
> +if (processor_target_table[i].processor == processor)
> +  return i;
> +
> +  return -1;
> +}

"int i" please, not "size_t".  This has nothing to do with object sizes.
The loop counter will always be a small number.

> +  /* At the moment, we don't have explict -mtune=future support.  If the user

"At the moment" is out of date almost as soon as you write it.  It is
better to avoid such terms ;-)

> + explicitly tried to use -mtune=future, give a warning.  If not, use the
> + power10 tuning until future tuning is added.  */

There should be Power11 tuning now, please use that?

So please post this -- as a separate series, and not as a single patch --
after fixing the things Ke Wen pointed out.  Thanks!

Segher

Re: Repost [PATCH 1/6] Add -mcpu=future

2024-02-08 Thread Segher Boessenkool

On Wed, Feb 07, 2024 at 05:21:10PM +0800, Kewen.Lin wrote:
> on 2024/2/6 14:01, Michael Meissner wrote:
> > It was more as a separation.  The MPCCORE, CELL, PPCA2, and TITAN are rather
> > old processors.

I'll probably remove Titan soonish, btw.  We have adjusted code around
it for what, fifteen years?  But the hardware never materialized.  There
are more silly things in our backend, but this one takes the prize.

> OK, considering we only get this warning once for a simple case, I'm inclined
> not to keep a static variable for it, it's the same as what we do currently
> for option conflict errors emission.  But I'm fine for either.

Whatever is easiest.

Segher

Re: Repost [PATCH 1/6] Add -mcpu=future

2024-02-08 Thread Segher Boessenkool

On Tue, Feb 06, 2024 at 01:01:52AM -0500, Michael Meissner wrote:
> > Nit: Named as "ISA_FUTURE_MASKS_SERVER" seems more accurate as it's 
> > constituted
> > with ISA_3_1_MASKS_**SERVER** ...
> 
> Well the _SERVER stuff was due to the power7 days when we still had to support
> the E500 in the main rs6000 tree.  But I will change it to be more consistant
> in the future patches.

"_SERVER" still is a good shortish name for the server systems ;-)

> > > @@ -67,7 +67,9 @@ enum processor_type
> > > PROCESSOR_MPCCORE,
> > > PROCESSOR_CELL,
> > > PROCESSOR_PPCA2,
> > > -   PROCESSOR_TITAN
> > > +   PROCESSOR_TITAN,
> > > +
> > 
> > Nit: unintentional empty line?
> > 
> > > +   PROCESSOR_FUTURE
> > >  };
> 
> It was more as a separation.  The MPCCORE, CELL, PPCA2, and TITAN are rather
> old processors.  I don't recall why we kept them after the POWER.

Please don't add random separations.

> Logically we should re-order the list and move MPCCORE, etc. earlier, but I
> will delete the blank line in future patches.

Don't randomly reorder, either.

_FUTURE should be added after POWER11.

> > I think we should also update asm_names in driver-rs6000.cc.
> 
> Ok.  Though the driver-rs6000.cc stuff won't kick in until we have a real
> system that matches "future".

Or when during development you have that faked.  You did test it, right?
:-)


Segher

Re: Repost [PATCH 0/6] PowerPC Future patches

2024-02-08 Thread Segher Boessenkool

Hi!

On Fri, Jan 05, 2024 at 06:27:05PM -0500, Michael Meissner wrote:
> In the current MMA subsystem for Power10, there are 8 512-bit accumulator
> registers.  These accumulators are each tied to sets of 4 FPR registers.  When

Four VSX registers -- the FP registers are only a 64 bit part of each of
those.  Please do not call those VSX registers "FPRs".  They are not.

> These patches add support for the 512-bit accumulators within the dense math
> system, and for allocation of the 1,024-bit DMRs.  At this time, no additional
> built-in functions will be done to support any dense math features other than
> doing data movement between the DMRs and the VSX registers.  Before we can 
> look
> at adding any new dense math support other than data movement, we need the GCC
> compiler to be able to allocate and use these DMRs.

Okido.

> If you compile with -mcpu=power10, the wD constraint will match the equivalent
> FPR register that overlaps with the accumulator.  If you compile with
> -mcpu=future, the wD constraint will match the DMR register and not the FPR
> register.
> 
> These patches also modifies the print_operand %A output modifier to print out
> DMR register numbers if -mcpu=future, and continue to print out the FPR
> register number divided by 4 for -mcpu=power10.

Yup.  Unfortunately that is the best we can do probably.  It _feels_
fragile, but it wil probably be okay in practice.

> Going forward, hopefully if you modify your code to use the wD constraint and
> %A output modifier, you can write code that switches more easily between the
> two systems.

But it will never become completely transparent.  Luckily the old thing
will over time fade into the background.

So, please post the -mcpu=future patches in a separate series, first.
I'll comment on that patch in a minute, you'll probably want to take
those comments into consideration before posting that series ;-)

Segher

Re: [PATCH] Add a late-combine pass [PR106594]

2023-12-30 Thread Segher Boessenkool

Hi!

On Tue, Oct 24, 2023 at 07:49:10PM +0100, Richard Sandiford wrote:
> This patch adds a combine pass that runs late in the pipeline.

But it is not.  It is a completely new thing, and much closer to
fwprop than to combine, too.

Could you rename it to something else, please?  Something less confusing
to both users and maintainers :-)

> There are two instances: one between combine and split1,

So, what kind of things does this do that the real combine does not?
And, same question but for fwprop.  That would be the crucial motivation
for why we want to have this new pass at all :-)

> The pass currently has a single objective: remove definitions by
> substituting into all uses.

The easy case ;-)

Segher

Re: [PATCH 1/2] RTX_COST: Count instructions

2023-12-30 Thread Segher Boessenkool

On Fri, Dec 29, 2023 at 09:14:52PM -0700, Jeff Law wrote:
> On 12/29/23 10:46, YunQiang Su wrote:
> >When we try to combine RTLs, the result may be very complex,
> >and `rtx_cost` may think that it need lots of costs. But in
> >fact, it may match a pattern in machine descriptions, which
> >may emit only 1 or 2 hardware instructions.  This combination
> >may be refused due to cost comparison failure.
> Then that's a problem with the backend's implementation of RTX_COST.
> 
> >Since the high cost may be due to a more expsensive operation.
> >To get real reason, we also need information about instruction
> >count.
> Then cost the *operations*, not the number of instructions.  Also note 
> that a single insn may generate multiple assembler instructions.
> 
> Even with all its warts, the real solution here is to fix the port's RTX 
> costs.

Or implement the insn_cost hook instead, it will be used preferably over
rtx_costs in most places then.  Including in the combiner.  insn_cost
is much easier to implement, and even possible to make good cost
estimates with :-)


Segher

Re: [PATCH] combine: Fix ICE in try_combine on pr112494.c [PR112560]

2023-11-30 Thread Segher Boessenkool

Hi!

On Wed, Nov 29, 2023 at 02:20:03PM +0100, Uros Bizjak wrote:
> On Wed, Nov 29, 2023 at 1:25 PM Richard Biener
>  wrote:
> > On Wed, Nov 29, 2023 at 10:35 AM Uros Bizjak  wrote:
> I was assuming that if the CC reg is not used inside the comparison,
> then the mode of CC reg is irrelevant. We can still combine the
> instructions into new insn, without updating the use of CC reg.

It should never happen that the CC reg is not used, so what does this
mean?

Where it is used might not be a comparison of course, as in your
example.

> && (cc_use_loc = find_single_use (SET_DEST (newpat), i3,
>   _use_insn)))
>   {
> -   compare_code = orig_compare_code = GET_CODE (*cc_use_loc);
> -   if (is_a  (GET_MODE (i2dest), ))
> - compare_code = simplify_compare_const (compare_code, mode,
> -, );
> -   target_canonicalize_comparison (_code, , , 1);
> +   if (COMPARISON_P (*cc_use_loc))
> + {
> +   compare_code = orig_compare_code = GET_CODE (*cc_use_loc);
> +   if (is_a  (GET_MODE (i2dest), ))
> + compare_code = simplify_compare_const (compare_code, mode,
> +, );
> +   target_canonicalize_comparison (_code, , , 1);
> + }
> +   else
> + {
> +   if (dump_file && (dump_flags & TDF_DETAILS))
> + fprintf (dump_file, "CC register not used in comparison.\n");

"Where the CCmode register is used is not a comparison".  But more
compact if you can manage (I cannot).


Segher

Re: [PATCH] rs6000, Add missing overloaded bcd builtin tests

2023-10-31 Thread Segher Boessenkool

On Tue, Oct 31, 2023 at 08:31:25AM -0700, Carl Love wrote:
> > I just found that actually they have the test coverage, because we
> > have
> > 
> > #define __builtin_bcdcmpeq(a,b)   __builtin_vec_bcdsub_eq(a,b,0)
> > #define __builtin_bcdcmpgt(a,b)   __builtin_vec_bcdsub_gt(a,b,0)
> > #define __builtin_bcdcmplt(a,b)   __builtin_vec_bcdsub_lt(a,b,0)
> > #define __builtin_bcdcmpge(a,b)   __builtin_vec_bcdsub_ge(a,b,0)
> > #define __builtin_bcdcmple(a,b)   __builtin_vec_bcdsub_le(a,b,0)
> > 
> > in altivec.h and gcc/testsuite/gcc.target/powerpc/bcd-4.c tests all
> > these
> 
> OK, my simple scripts are not going to pickup the stuff in altivec.h. 
> They were just grepping for the built-in name in the test file
> directory.

You could use gcov to see which rs6000 builtins are not exercised by
anything in the testsuite, maybe.  This probably can be automated pretty
nicely.


Segher

Re: [PATCH v2] swap: Fix incorrect lane extraction by vec_extract() [PR106770]

2023-10-28 Thread Segher Boessenkool

Hi!

Please say "rs6000/p8swap:" in the subject, not "swap:" :-)

On Sun, Sep 10, 2023 at 10:58:32PM +0530, Surya Kumari Jangala wrote:
> Another issue with always handling swappable instructions is that it is
> incorrect to do so in webs where loads/stores on quad word aligned
> addresses are changed to lvx/stvx.

Why?  Please say why in the commit message (the message you send with
your patch should be the exact eventual commit message!)

> gcc/
>   PR rtl-optimization/PR106770
>   * config/rs6000/rs6000-p8swap.cc (non_permuting_mem_insn): New
>   function.

Please don't break commit message / changelog lines early unnecessarily.
Lines are 80 chars, the leading tab counts as 8.

> +  /* Set if the swappable insns in the web represented by this entry
> + have to be fixed. Swappable insns have to be fixed in :

(no space before colon)

> +static bool
> +non_permuting_mem_insn (swap_web_entry *insn_entry, unsigned int i)
> +{
> +  return (insn_entry[i].special_handling == SH_NOSWAP_LD ||
> +   insn_entry[i].special_handling == SH_NOSWAP_ST);
> +}

"return" is not a function, you don't need parens here.

> +/* Convert a non-permuting load/store insn to a permuting one.  */
> +static void
> +handle_non_permuting_mem_insn (swap_web_entry *insn_entry, unsigned int i)

A better name would be good, "handle" is a weaselword :-)  It is a
static, so a shorter name is completely acceptable (say, one that
wouldn't be acceptable with bigger than file scope).

> +  rtx_insn *insn = insn_entry[i].insn;
> +  if (insn_entry[i].special_handling == SH_NOSWAP_LD)
> +permute_load (insn);
> +  else if (insn_entry[i].special_handling == SH_NOSWAP_ST)
> +permute_store (insn);

Lose the "else"?  The compiler can do micro-optimisations a million
times better than any user could.  Simpler, more readable (better
understandable!) code is much preferred.

> +  /* Perform special handling for swappable insns that require it.

That is a completely contentless sentence :-(

> + Note that special handling should be done only for those
> + swappable insns that are present in webs marked as requiring
> + special handling.  */

This one isn't much better.

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr106770.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile } */

This is the default, you do not need this.

> +/* { dg-require-effective-target powerpc_p8vector_ok } */
> +/* { dg-options "-mdejagnu-cpu=power8 -O2 " } */
> +/* The 2 xxpermdi instructions are generated by the two
> +   calls to vec_promote() */
> +/* { dg-final { scan-assembler-times "xxpermdi" 2 } } */

Please enclose in {}.  Use double quotes in Tcl only when tou want the
interpolation they cause.  Default to using {} instead.

So please fix those things, and write a better commit message.  Ideally
the commit messsage will tell everything needed to understand the patch
(so also to review the patch).  Maybe add examples where needed.  So
reviewing the code in the patch should be an easy thing to do, after
reading the commit message :-)

Segher

Re: [PATCH] Cleanup: Replace UNSPEC_COPYSIGN with copysign RTL

2023-09-30 Thread Segher Boessenkool

On Fri, Sep 29, 2023 at 02:09:12PM -0400, Michael Meissner wrote:
>   * config/rs6000/rs6000.md (UNSPEC_COPYSIGN): Delete.
>   (copysign3_fcpsg): Use copysign RTL instead of UNSPEC.

(typo, it is _fcpsgn)

Nice to see unnecessary unspecs going away :-)


Segher

Re: [PATCH V4, rs6000] Disable generation of scalar modulo instructions

2023-09-13 Thread Segher Boessenkool

Hi!

On Fri, Jun 30, 2023 at 02:26:35PM -0500, Pat Haugen wrote:
> gcc/
> * config/rs6000/rs6000.cc (rs6000_rtx_costs): Check if disabling
> scalar modulo.

"Check whether the modulo instruction is disabled?"

> * config/rs6000/rs6000.md (mod3, *mod3): Disable.
> (define_expand umod3): New.
> (define_insn umod3): Rename to *umod3 and disable.
> (umodti3, modti3): Disable.

None of these patterns are disabled!  Instead, the new TARGET_* thing
is used.

> +/* Disable generation of scalar modulo instructions due to performance issues
> +   with certain input values.  This can be removed in the future when the
> +   issues have been resolved.  */
> +#define RS6000_DISABLE_SCALAR_MODULO 1

I think that is a bit optimistic -- in the future we will still want to
support older cores ;-)

> -  "TARGET_POWER10 && TARGET_POWERPC64"
> +  "TARGET_POWER10 && TARGET_POWERPC64 && !RS6000_DISABLE_SCALAR_MODULO"
>"vmoduq %0,%1,%2"

Did we ever test if this insn in fact is slower as well?  I don't mean
either way, orthogonality is good, but just for my enlightenment.

> +++ b/gcc/testsuite/gcc.target/powerpc/clone1.c

> +/* { Fail due to RS6000_DISABLE_SCALAR_MODULO. */

Xfail, but heh.  No need to change that.

> +/* { dg-final { scan-assembler-times {\mdivd\M}  1 { xfail *-*-* } } } */

> +/* { Fail due to RS6000_DISABLE_SCALAR_MODULO. */
> +/* { dg-final { scan-assembler-times {\mmodsw\M} 1 { xfail *-*-* } } } */

Thanks for the \m \M, it is much harder to determine if the tests
actually work, without that :-)

With improved changelog: okay for trunk.  Okay for all backports as
well (after some soak time).

Thank you!


Segher

Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems

2023-09-07 Thread Segher Boessenkool

On Thu, Sep 07, 2023 at 02:23:00PM +0300, Dan Carpenter wrote:
> On Thu, Sep 07, 2023 at 06:04:09AM -0500, Segher Boessenkool wrote:
> > On Thu, Sep 07, 2023 at 12:48:25PM +0300, Dan Carpenter via Gcc-patches 
> > wrote:
> > > I started to hunt
> > > down all the Makefile which add a -Werror but there are a lot and
> > > eventually I got bored and gave up.
> > 
> > I have a patch stack for that, since 2014 or so.  I build Linux with
> > unreleased GCC versions all the time, so pretty much any new warning is
> > fatal if you unwisely use -Werror.
> > 
> > > Someone should patch GCC so there it checks an environment variable to
> > > ignore -Werror.  Somethine like this?
> > 
> > No.  You should patch your program, instead.
> 
> There are 2930 Makefiles in the kernel source.

Yes.  And you need patches to about thirty.  Or a bit more, if you want
to do it more cleanly.  This isn't a guess.

> > One easy way is to add a
> > -Wno-error at the end of your command lines.  Or even just -w if you
> > want or need a bigger hammer.
> 
> I tried that.  Some of the Makefiles check an environemnt variable as
> well if you want to turn off -Werror.  It's not a complete solution at
> all.  I have no idea what a complete solution looks like because I gave
> up.

A solution can not involve changing the compiler.  That is just saying
the kernel doesn't know how to fix its own problems, so let's give the
compiler some more unnecessary problems.

> > Or nicer, put it all in Kconfig, like powerpc already has for example.
> > There is a CONFIG_WERROR as well, so maybe use that in all places?
> 
> That's a good idea but I'm trying to compile old kernels and not the
> current kernel.

You can patch older kernels, too, you know :-)

If you need to not make any changes to your source code for some crazy
reason (political perhaps?), just use a shell script or shell function
instead of invoking the compiler driver directly?


Segher

Segher

Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems

2023-09-07 Thread Segher Boessenkool

On Thu, Sep 07, 2023 at 07:22:45AM -0400, Steven Rostedt wrote:
> On Thu, 7 Sep 2023 06:04:09 -0500
> Segher Boessenkool  wrote:
> > On Thu, Sep 07, 2023 at 12:48:25PM +0300, Dan Carpenter via Gcc-patches 
> > wrote:
> > No.  You should patch your program, instead.  One easy way is to add a
> > -Wno-error at the end of your command lines.  Or even just -w if you
> > want or need a bigger hammer.
> 
> That's not really possible when bisecting a kernel bug into older kernels.
> The build system is highly complex and requires hundreds of changes to do
> what you suggested. As it is for a bisection that takes a minimum of 13
> iterations, your approach just isn't feasible.

Isn't this exactly what KCFLAGS is for?

But, I meant to edit the build system.  It isn't so hard to bisect with
patch stacks on top.  Just a bit annoying.


Segher

Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems

2023-09-07 Thread Segher Boessenkool

On Thu, Sep 07, 2023 at 12:48:25PM +0300, Dan Carpenter via Gcc-patches wrote:
> I started to hunt
> down all the Makefile which add a -Werror but there are a lot and
> eventually I got bored and gave up.

I have a patch stack for that, since 2014 or so.  I build Linux with
unreleased GCC versions all the time, so pretty much any new warning is
fatal if you unwisely use -Werror.

> Someone should patch GCC so there it checks an environment variable to
> ignore -Werror.  Somethine like this?

No.  You should patch your program, instead.  One easy way is to add a
-Wno-error at the end of your command lines.  Or even just -w if you
want or need a bigger hammer.

Or nicer, put it all in Kconfig, like powerpc already has for example.
There is a CONFIG_WERROR as well, so maybe use that in all places?

> +static bool
> +ignore_w_error(void)
> +{
> +  char *str;
> +
> +  str = getenv("IGNORE_WERROR");
> +  if (str && strcmp(str, "1") == 0)

space before (

>  case OPT_Werror:
> +  if (ignore_w_error())
> + break;
>dc->warning_as_error_requested = value;
>break;
>  
>  case OPT_Werror_:
> -  if (lang_mask == CL_DRIVER)
> + if (ignore_w_error())
> + break;
> + if (lang_mask == CL_DRIVER)
>   break;

The new indentation is messed up.  And please don't move the existing
early-out to later, it make more sense earlier, the way it was.

Segher

Re: [PATCH] Fix typo in insn name.

2023-07-10 Thread Segher Boessenkool

Hi!

On Mon, Jul 10, 2023 at 03:59:44PM -0400, Michael Meissner wrote:
> In doing other work, I noticed that there was an insn:
> 
>   vsx_extract_v4sf__load
> 
> Which did not have an iterator.  I removed the useless .

This patch does that, you mean.

> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -3576,7 +3576,7 @@ (define_insn_and_split "vsx_extract_v4sf"
>[(set_attr "length" "8")
> (set_attr "type" "fp")])
>  
> -(define_insn_and_split "*vsx_extract_v4sf__load"
> +(define_insn_and_split "*vsx_extract_v4sf_load"
>[(set (match_operand:SF 0 "register_operand" "=f,v,v,?r")
>   (vec_select:SF
>(match_operand:V4SF 1 "memory_operand" "m,Z,m,m")

Does this fix any ICEs?  Or do you have some example that makes better
machine code after this change?  Or would a better change perhaps be to
just remove this pattern completely, if it doesn't do anything useful?

I.e., please include a new testcase.

Segher

Re: [PATCH] rs6000: Don't ICE when generating vector pair load/store insns [PR110411]

2023-07-06 Thread Segher Boessenkool

On Thu, Jul 06, 2023 at 02:48:19PM -0500, Peter Bergner wrote:
> On 7/6/23 12:33 PM, Segher Boessenkool wrote:
> > On Wed, Jul 05, 2023 at 05:21:18PM +0530, P Jeevitha wrote:
> >> --- a/gcc/config/rs6000/rs6000.cc
> >> +++ b/gcc/config/rs6000/rs6000.cc
> >> @@ -9894,6 +9894,8 @@ rs6000_legitimate_address_p (machine_mode mode, rtx 
> >> x, bool reg_ok_strict)
> >>  
> >>/* Handle unaligned altivec lvx/stvx type addresses.  */
> >>if (VECTOR_MEM_ALTIVEC_OR_VSX_P (mode)
> >> +  && mode !=  OOmode
> >> +  && mode !=  XOmode
> >>&& GET_CODE (x) == AND
> >>&& CONST_INT_P (XEXP (x, 1))
> >>&& INTVAL (XEXP (x, 1)) == -16)
> > 
> > Why do we need this for OOmode and XOmode here, but not for the other
> > modes that are equally not allowed?  That makes no sense.
> 
> VECTOR_MEM_ALTIVEC_OR_VSX_P (mode) already filters those modes out
> (eg, SImode, DFmode, etc.), just not OOmode and XOmode, since those both
> are modes used in/with VSX registers.

It does not filter anything out, no.  That simply checks if a datum of
that mode can be loaded into vector registers or not.  For example
SImode could very well be loaded into vector registers!  (It just is not
such a great idea).

And for some reason there is VECTOR_P8_VECTOR as well, which is mixing
multiple concepts already.  Let's not add more, _please_.

> > Should you check for anything that is more than a register, for example?
> > If so, do *that*?
> 
> Well rs6000_legitimate_address_p() is only passed the MEM rtx, so we have
> no idea if this is a load or store, so we're clueless on number of regs
> needed to hold this mode.  The best we could do is something like

That is *bigger than* a register.  It's the same in Dutch, sorry, I am
tired :-(

>   GET_MODE_SIZE (mode) == GET_MODE_SIZE (V16QImode)
> 
> or some such thing.  Would you prefer something like that?

That is even worse :-(

> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.target/powerpc/pr110411.c
> >> @@ -0,0 +1,21 @@
> >> +/* PR target/110411 */
> >> +/* { dg-options "-O2 -mdejagnu-cpu=power10 -S -mblock-ops-vector-pair" } 
> >> */
> > 
> > -S in testcases is wrong.  Why do you want this?  It is *good* if this
> > is hauled through the assembler as well!  If you *really* want this you
> > use "dg-do assemble", but you shouldn't.
> 
> For test cases checking for ICEs, we don't need to assemble, so I agree,
> we just need to remove the -S option, which is implied by this being a
> dg-do compile test case (the default for this test directory).

We *do* want to assemble.  It is a general principle that we want to
test as much as possible whenever possible.  *Most* problems are found
with the help of testcases that were never designed for the problem
found!

dg-do compile *does* invoke the assembler, btw.  As it should.


Segher

Re: [PATCH] rs6000: Don't ICE when generating vector pair load/store insns [PR110411]

2023-07-06 Thread Segher Boessenkool

Hi!

On Wed, Jul 05, 2023 at 05:21:18PM +0530, P Jeevitha wrote:
> The following patch has been bootstrapped and regtested on powerpc64le-linux.
> 
> while generating vector pairs of load & store instruction, the src address
> was treated as an altivec type and that type of address is invalid for 
> lxvp and stxvp insns. The solution for this is to avoid altivec type address
> for OOmode and XOmode.

The mail message you send should be what will end up in the Git commit
message.  Your lines are too long for that (and the subject is much too
long btw), and the content isn't right either.

Maybe something like

"""
rs6000: Don't allow OOmode or XOmode in AltiVec addresses (PR110411)

There are no instructions that do traditional AltiVec addresses (i.e.
with the low four bits of the address masked off) for OOmode and XOmode
objects.  Don't allow those in rs6000_legitimate_address_p.
"""

> gcc/
>   PR target/110411
>   * config/rs6000/rs6000.cc (rs6000_legitimate_address_p): Avoid altivec
>   address for OOmode and XOmde.

(XOmode, sp.)

Not "avoid", disallow.  If you avoid something you still allow it, you
just prefer to see something else.

> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -9894,6 +9894,8 @@ rs6000_legitimate_address_p (machine_mode mode, rtx x, 
> bool reg_ok_strict)
>  
>/* Handle unaligned altivec lvx/stvx type addresses.  */
>if (VECTOR_MEM_ALTIVEC_OR_VSX_P (mode)
> +  && mode !=  OOmode
> +  && mode !=  XOmode
>&& GET_CODE (x) == AND
>&& CONST_INT_P (XEXP (x, 1))
>&& INTVAL (XEXP (x, 1)) == -16)

Why do we need this for OOmode and XOmode here, but not for the other
modes that are equally not allowed?  That makes no sense.

Should you check for anything that is more than a register, for example?
If so, do *that*?

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr110411.c
> @@ -0,0 +1,21 @@
> +/* PR target/110411 */
> +/* { dg-options "-O2 -mdejagnu-cpu=power10 -S -mblock-ops-vector-pair" } */

-S in testcases is wrong.  Why do you want this?  It is *good* if this
is hauled through the assembler as well!  If you *really* want this you
use "dg-do assemble", but you shouldn't.

Segher

Re: [PATCH, V6] Fix power10 fusion and -fstack-protector, PR target/105325

2023-06-20 Thread Segher Boessenkool

Hi!

The patch looks great now, thanks you!

But the commit message needs some work:

First off, the subject, which is a short (50 character max!) summary of
what the patch is about.
Fix power10 fusion and -fstack-protector, PR target/105325

There is absolutely nothing to do with stack protector, it does not
belong in the commit message at all, and certainly not in the subject!

On Tue, Jun 13, 2023 at 10:14:02PM -0400, Michael Meissner wrote:
> This patch fixes an issue where if you use the -fstack-protector and
> -mcpu=power10 options and you have a large stack frame, the GCC compiler will
> generate a LWA instruction with a large offset.

That is not the core issue, it is just one example where things went
wrong.  That prompted this patch, sure, so you can talk about that ten
or so lines down if you think it is important (I don't fwiw), but not at
the start here.  You should just say what was wrong, so people with a
short attention span can just skip this patch when looking through git
log (and even earlier if the subject would be good).

Commit messages are for *future* users.  Not for getting your patch
approved.

> Here is the initial fused initial insn that was created.  It refers to the
> stack location based off of the virtrual frame pointer:

The soft frame pointer, not a virtual one.  For PowerPC this is not a
real register and LRA will eventually replace it, sure.  "Virtual" here
in GCC has a very specific meaning; virtual things are replaced very
soon after expand.

> When the split2 pass is run after reload has finished the ds_form_mem_operand
> predicate that was used for lwa and ld no longer returns true.

Yes.  It is the wrong predicate to use here.  *That* is the problem.

> 2)Delete ds_form_mem_operand since it is no longer used.

... and we don't expect to use it any time soon.

> 3)Use the "YZ" constraints for ld/lwa instead of "m".

Yes, constraints and predicates.

>   * config/rs6000/genfusion.pl (gen_ld_cmpi_p10_one): Fix problems that
>   allowed prefixed lwa to be generated.

You should not say what the *old* code did, in the changelog!

> --- a/gcc/config/rs6000/genfusion.pl
> +++ b/gcc/config/rs6000/genfusion.pl
> @@ -61,20 +61,30 @@ sub gen_ld_cmpi_p10_one
>my $mempred = "non_update_memory_operand";
>my $extend;
>  
> +  # We need to special case lwa.  The prefixed_load_p function in rs6000.cc
> +  # (which determines if a load instruction is prefixed) uses the fact that 
> the
> +  # register mode is different from the memory mode, and that the sign_extend
> +  # attribute is set to use DS-form rules for the address instead of D-form.
> +  # If the register size is the same, prefixed_load_p assumes we are doing a
> +  # lwz.  We change to use an lwz and word compare if we don't need to sign
> +  # extend the SImode value.  Otherwise if we need the value, we need to
> +  # make sure the insn is marked as ds-form.
> +  my $lwa_insn = ($lmode eq "SI" && $ccmode eq "CC");

That is a pretty bad name, the variable does not hold an "insn" in any
way, shape, or form.  It is hardish to give it a good name because it
mixes two questions into one variable?  You can just repeat the tiny
conditions wherever you use them, and the code would be more readable
(and less cryptic!)

> +  if ($lwa_insn && $cmp_size eq "d") {

Name it "cmp_size_char" maybe?  "cmp_size" suggests a number.

> --- /dev/null
> +++ b/gcc/testsuite/g++.target/powerpc/pr105325.C
> @@ -0,0 +1,26 @@
> +/* { dg-do assemble } */
> +/* { dg-require-effective-target lp64 } */
> +/* { dg-require-effective-target power10_ok } */
> +/* { dg-require-effective-target powerpc_prefixed_addr } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power10 -fstack-protector" } */
> +
> +/* Test that power10 fusion does not generate an LWA/CMPDI instruction pair
> +   instead of PLWZ/CMPWI.  Ultimately the code was dying because the fusion
> +   load + compare -1/0/1 patterns did not handle the possibility that the 
> load
> +   might be prefixed.  The -fstack-protector option is needed to show the
> +   bug.  */

Mention the PR number somewhere in the text as well?  For grep etc.

Okay for trunk, with some more reasonable commmit message.  Thank you!
Also okay for all backports.

Segher

Re: [PATCH V3 1/4] rs6000: build constant via li;rotldi

2023-06-16 Thread Segher Boessenkool

Hi!

On Fri, Jun 16, 2023 at 04:34:12PM +0800, Jiufu Guo wrote:
> +/* Check if value C can be built by 2 instructions: one is 'li', another is
> +   rotldi.
> +
> +   If so, *SHIFT is set to the shift operand of rotldi(rldicl), and *MASK
> +   is set to -1, and return true.  Return false otherwise.  */

Don't say "is set to -1", the point of having this is so you say "is set
to the "li" value".  Just like you describe what SHIFT is for.

> +static bool
> +can_be_built_by_li_and_rotldi (HOST_WIDE_INT c, int *shift,
> +HOST_WIDE_INT *mask)
> +{
> +  int n;

Put shis later, like:

> +  /* Check if C can be rotated to a positive or negative value
> +  which 'li' instruction is able to load.  */
  int n;
> +  if (can_be_rotated_to_lowbits (c, 15, )
> +  || can_be_rotated_to_lowbits (~c, 15, ))
> +{
> +  *mask = HOST_WIDE_INT_M1;
> +  *shift = HOST_BITS_PER_WIDE_INT - n;
> +  return true;
> +}

It is tricky to see ~c will always work, since what is really done is -c
instead.  Can you just use that here?

> @@ -10266,15 +10291,14 @@ static void
>  rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
>  {
>rtx temp;
> +  int shift;
> +  HOST_WIDE_INT mask;
>HOST_WIDE_INT ud1, ud2, ud3, ud4;
>  
>ud1 = c & 0x;
> -  c = c >> 16;
> -  ud2 = c & 0x;
> -  c = c >> 16;
> -  ud3 = c & 0x;
> -  c = c >> 16;
> -  ud4 = c & 0x;
> +  ud2 = (c >> 16) & 0x;
> +  ud3 = (c >> 32) & 0x;
> +  ud4 = (c >> 48) & 0x;
>  
>if ((ud4 == 0x && ud3 == 0x && ud2 == 0x && (ud1 & 0x8000))
>|| (ud4 == 0 && ud3 == 0 && ud2 == 0 && ! (ud1 & 0x8000)))
> @@ -10305,6 +10329,17 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT 
> c)
>emit_move_insn (dest, gen_rtx_XOR (DImode, temp,
>GEN_INT ((ud2 ^ 0x) << 16)));
>  }
> +  else if (can_be_built_by_li_and_rotldi (c, , ))
> +{
> +  temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
> +  unsigned HOST_WIDE_INT imm = (c | ~mask);
> +  imm = (imm >> shift) | (imm << (HOST_BITS_PER_WIDE_INT - shift));
> +
> +  emit_move_insn (temp, GEN_INT (imm));
> +  if (shift != 0)
> + temp = gen_rtx_ROTATE (DImode, temp, GEN_INT (shift));
> +  emit_move_insn (dest, temp);
> +}

If you would rewrite so it isn't such a run-on thing with "else if",
instead using early outs, or even some factoring, you could declare the
variable used only in a tiny scope in that tiny scope instead.

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/const-build.c
> @@ -0,0 +1,54 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -save-temps" } */
> +/* { dg-require-effective-target has_arch_ppc64 } */

Please put a tiny comment here saying what this test is *for*?  The file
name is a bit of hint already, but you can indicate much more in one or
two lines :-)

With those adjustments, okay for trunk.  Thanks!

(If -c doesn't work, it needs more explanation).


Segher

Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie

2023-06-15 Thread Segher Boessenkool

On Thu, Jun 15, 2023 at 03:00:40PM +0800, Jiufu Guo wrote:
> >>   This is the existing pattern.  It may be read as an action
> >>   to clean an unknown-size memory block.
> >
> > Including a size zero memory block, yes.  BLKmode was originally to do
> > things like bcopy (before modern names like memcpy were more usually
> > used), and those very much need size zero as well.h
> 
> The size is possible to be zero.  No asm code needs to
> be generated for "set 'const_int 0' to zero size memory"".
> stack_tie does not generate any real code.  It seems ok :)
> 
> While, it may not be zero size mem.  This may be a concern.
> This is one reason that I would like to have an unspec_tie.

It very much *can* be a zero size mem, that is perfectly find for
mem:BLK.

> Another reason is unspec:blk is used but various ports :) 

unspec:BLK is undefined.  BLKmode is allowed on mem only.

> >> 2. "set (mem/c:BLK (reg/f:DI 1 1) unspec:blk (const_int 0 [0])
> >> UNSPEC_TIE".
> >>   Current patch is using this one.
> >
> > What would be the semantics of that?  Just the same as the current stuff
> > I'd say, or less?  It cannot be more!
> 
> The semantic that I trying to achieve is "this is a special
> insn, not only a normal set to unknown size mem".

What does that *mean*?  "Special instruction"?  What would what code do
for that?  What would the RTL mean?

> As you explained before on 'unspec:DI', the unspec would
> just decorate the set_src part: something DI value with
> machine-specific operation.

An unspec is an operation on its operands, giving some (in this case)
DImode value.  There is nothing special about that operation, it can be
optimised like any other, it's just not specified what exactly that
value is (to the generic compiler, the backend itself can very much
optimise stuff with it).

> But, since 'tie_operand' is checked for this insn.
> If 'tie_operand' checks UNPSEC_TIE, then the insn
> with UNPSEC_TIE is 'a special insn'.  Or interpret
> the semantic of this insn as: this insn stack_ite
> indicates "set/operate a zero size block".

tie_operand is a predicate.  The predicate of an insn has to return 1,
or the insn is not recognised.  You can do the same in insn conditions
always (in principle anyway).

Segher

Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie

2023-06-14 Thread Segher Boessenkool

On Wed, Jun 14, 2023 at 06:25:10PM +0200, Richard Biener wrote:
> > Form rs6000.md:
> > ; This is to explain that changes to the stack pointer should
> > ; not be moved over loads from or stores to stack memory.
> > (define_insn "stack_tie"
> 
> That suggests it’s the hard register value that‘s protected, not the memory 
> pointed to.  I suppose that means an unspec volatile with the reg as input 
> would serve the same?

No?  It says what it says.  That is pretty vague language, of course,
not entirely by accident no doubt.

> Or maybe that’s not the whole story.
> 
> 
> > and from rs6000-logue.cc:
> > /* This ties together stack memory (MEM with an alias set of 
> > frame_alias_set)
> >   and the change to the stack pointer.  */
> > static void
> > rs6000_emit_stack_tie (rtx fp, bool hard_frame_needed)
> 
> I cannot make sense of that comment, but not sure if I really want to know …

It really is the same thing: this is a bloody heavy hammer keeping the
change to the stack pointer (or "hard" frame pointer) in place wrt any
accesses to the stack memory.

If there was a nice portable way to avoid needing this we haven't found
it yet -- or a non-portable way even, and it doesn't have to be all that
nice either come to think of it :-)


Segher

Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie

2023-06-14 Thread Segher Boessenkool

Hi!

On Wed, Jun 14, 2023 at 10:04:20AM +0100, Richard Sandiford wrote:
> I'd also understood it to be either.  As in, it is a may-clobber
> that can be used for must-clobber.  Alternatively: the value stored
> is unpredictable, and can therefore be the same as the current value.

Yes, it is a set with an unspecified RHS.

> I think the main difference between:
> 
>   (clobber (mem:BLK …))
> 
> and
> 
>   (set (mem:BLK …) (unspec:BLK …))
> 
> is that the latter must happen for correctness (unless something
> that understands the unspec proves otherwise) whereas a clobber
> can validly be dropped.  So for something like stack_tie, a set
> seems more correct than a clobber.

No, the latter can be removed as well, under exactly the same
conditions: if no code after it reads what was written.  This happens in
branches marked dead.


Segher

Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie

2023-06-14 Thread Segher Boessenkool

On Wed, Jun 14, 2023 at 09:22:09AM +, Richard Biener wrote:
> How can a clobber be validly dropped?

Same as any other set: if no code executed after it can read whatever is
written.  This typically means a stack frame goes away, or simply no
more code is executed *at all* after this.

> For the case of stack
> memory if there's no stack use after it it could be elided
> and I suppose the clobber itself can be moved.  But then
> the function return is a stack use as well.

A function return does not access the stack at all on most
architectures, including PowerPC.  Some epilogue insns can do, of
course, but we expand to separate insns during expand already.

Segher

Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie

2023-06-14 Thread Segher Boessenkool

Hi!

On Wed, Jun 14, 2023 at 09:52:37AM +, Richard Biener wrote:
> I see.  So
> 
> (parallel
>  (unspec stack_tie)
>  (clobber (mem:BLK ...)))

Written like this, without a "set", *every* unspec has to be an
unspec_volatile, for the same reason as all inline asms without outputs
always are considered volatile asm.  The "unspec" arm of the parallel
can be omitted, and if that is valid RTL (possibly after other changes,
like omitting the parallel,replacing it by its one remaining arm), this
is a prefectly valid transformation.

> I suppose it needs to be an unspec_volatile?  It feels like
> the stack_ties are a delicate hack preventing enough but not too
> much optimization ...

Yes.  So let's please not disturb it :-)

It isn't a "delicate" hack I would say, but its effects are needed in
some places, and messing it up leads to hard to debug problems.  Which
had happened time and time again over the years.

It just is hard to deal with variable sized stack adjustments and the
like.  As long as we only use stack ties in such unusual cases, all is
fine.  There are worse things, like what we have the
frame_pointer_needed_indeed thing for :-)

Segher

Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie

2023-06-14 Thread Segher Boessenkool

Hi!

On Wed, Jun 14, 2023 at 05:26:52PM +0800, Jiufu Guo wrote:
> Richard Biener  writes:
> >> 3. "set (mem/c:DI (reg/f:DI 1 1) unspec:DI (const_int 0 [0])
> >> UNSPEC_TIE".
> >>This avoids using BLK on unspec, but using DI.
> >
> > That gives the MEM a size which means we can interpret the (set ..)
> > as killing a specific area of memory, enabling DSE of earlier
> > stores.
> 
> Oh, thanks!
> While with 'unspec:DI', I'm wondering if it means this 'set' would
> do some special things other than pure 'set' to the memory. 

No, that is not what unspec means.  It just means "some DImode value I'm
not telling you anything about".  If to get that value there is some
important work done (that should not be oprimised away, say) you need
unspec_volatile, which means just that: there is an unspecified side
effect done by that insn, so it has to be done on the real machine
exactly like on the abstract C machine, so the insn has big restrictions
on being moved and removed etc.

We can replace the RHS of (almost) *every* set with an unspec, and the
compiler would still work, just would generate pretty lousy code.  But
at least CSE and DSE (and everything else purely dataflow) would still
work :-)

Segher

Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie

2023-06-14 Thread Segher Boessenkool

Hi!

On Wed, Jun 14, 2023 at 07:59:04AM +, Richard Biener wrote:
> On Wed, 14 Jun 2023, Jiufu Guo wrote:
> > 3. "set (mem/c:DI (reg/f:DI 1 1) unspec:DI (const_int 0 [0])
> > UNSPEC_TIE".
> >This avoids using BLK on unspec, but using DI.
> 
> That gives the MEM a size which means we can interpret the (set ..)
> as killing a specific area of memory, enabling DSE of earlier
> stores.

Or DSE can delete this tie even, if it can see some later store to the
same location without anything in between that can read what the tie
stores.

BLKmode avoids all of this.  You can call that elegant, you can call it
cheating, you can call it many things -- but it *works*.

> AFAIU this special instruction is only supposed to prevent
> code motion (of stack memory accesses?) across this instruction?

Form rs6000.md:
; This is to explain that changes to the stack pointer should
; not be moved over loads from or stores to stack memory.
(define_insn "stack_tie"

and from rs6000-logue.cc:
/* This ties together stack memory (MEM with an alias set of frame_alias_set)
   and the change to the stack pointer.  */
static void
rs6000_emit_stack_tie (rtx fp, bool hard_frame_needed)

A big reason this is needed is because of all the hard frame pointer
stuff, which the generic parts of GCC require, but there is no register
for that in the Power architecture.  Nothing is an issue here in most
cases, but sometimes we need to do unusual things to the stack, say for
alloca.

> I'd say a
> 
>   (may_clobber (mem:BLK (reg:DI 1 1)))

"clobber" always means "may clobber".  (clobber X) means X is written
with some unspecified value, which may well be whatever value it
currently holds.  Via some magical means or whatever, there is no
mechanism specified, just the effects :-)

> might be more to the point?  I've used "may_clobber" which doesn't
> exist since I'm not sure whether a clobber is considered a kill.
> The docs say "Represents the storing or possible storing of an 
> unpredictable..." - what is it?  Storing or possible storing?

It is the same thing.  "clobber" means the same thing as "set", except
the value that is written is not specified.

> I suppose stack_tie should be less strict than the documented
> (clobber (mem:BLK (const_int 0))) (clobber all memory).

"clobber" is nicer than the set to (const_int 0).  Does it work though?
All this code is always fragile :-/  I'm all for this change, don't get
me wrong, but preferably things stay in working order.

We use "stack_tie" as a last resort heavy hammer anyway, in all normal
cases we explain the actual data flow explicitly and correctly, also
between the various registers used in the *logues.

Segher

Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie

2023-06-14 Thread Segher Boessenkool

Hi!

On Wed, Jun 14, 2023 at 12:06:29PM +0800, Jiufu Guo wrote:
> Segher Boessenkool  writes:
> I'm also thinking about other solutions:
> 1. "set (mem/c:BLK (reg/f:DI 1 1) (const_int 0 [0])"
>   This is the existing pattern.  It may be read as an action
>   to clean an unknown-size memory block.

Including a size zero memory block, yes.  BLKmode was originally to do
things like bcopy (before modern names like memcpy were more usually
used), and those very much need size zero as well.

> 2. "set (mem/c:BLK (reg/f:DI 1 1) unspec:blk (const_int 0 [0])
> UNSPEC_TIE".
>   Current patch is using this one.

What would be the semantics of that?  Just the same as the current stuff
I'd say, or less?  It cannot be more!

> 3. "set (mem/c:DI (reg/f:DI 1 1) unspec:DI (const_int 0 [0])
> UNSPEC_TIE".
>This avoids using BLK on unspec, but using DI.

And is incorrect because of that.

> 4. "set (mem/c:BLK (reg/f:DI 1 1) unspec (const_int 0 [0])
> UNSPEC_TIE"
>There is still a mode for the unspec.

It has VOIDmode here, which is incorrect.

> > On Tue, Jun 13, 2023 at 08:23:35PM +0800, Jiufu Guo wrote:
> >> +&& XINT (SET_SRC (set), 1) == UNSPEC_TIE
> >> +&& XVECEXP (SET_SRC (set), 0, 0) == const0_rtx);
> >
> > This makes it required that the operand of an UNSPEC_TIE unspec is a
> > const_int 0.  This should be documented somewhere.  Ideally you would
> > want no operand at all here, but every unspec has an operand.
> 
> Right!  Since checked UNSPEC_TIE arleady, we may not need to check
> the inner operand. Like " && XINT (SET_SRC (set), 1) == UNSPEC_TIE);".

Yes.  But we should write down somewhere (in a comment near the unspec
constant def for example) what the operand is -- so, "operand is usually
(const_int 0) because we have to put *something* there" or such.  The
clumsiness of this is enough for me to prefer some other solution
already ;-)

Segher

Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie

2023-06-14 Thread Segher Boessenkool

Hi!

On Wed, Jun 14, 2023 at 05:18:15PM +0800, Xi Ruoyao wrote:
> The generic issue here is to fix (not "papering over") the signed
> overflow, we need to perform the addition in a target machine mode.  We
> may always use Pmode (IIRC const_anchor was introduced for optimizing
> some constant addresses), but can we do better?

The main issue is that the machine description generated target code to
compute some constants, but the sanitizer treats it as if it was user
code that might do wrong things.

> Should we try addition in both DImode and SImode for a 64-bit capable
> machine?

Why?  At least on PowerPC there is only one insn, and it is 64 bits.
The SImode version just ignores all bits other than the low 32 bits, in
both inputs and output.

> Or should we even try more operations than addition (for eg bit
> operations like xor or shift)?  Doing so will need to create a new
> target hook for const anchoring, this is the "complete rework" I meant.

This might make const anchor useful for way more targets maybe,
including rs6000, yes :-)

Segher

Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie

2023-06-13 Thread Segher Boessenkool

Hi!

As I said in a reply to the original patch: not okay.  Sorry.

But some comments on this patch:

On Tue, Jun 13, 2023 at 08:23:35PM +0800, Jiufu Guo wrote:
> +   && XINT (SET_SRC (set), 1) == UNSPEC_TIE
> +   && XVECEXP (SET_SRC (set), 0, 0) == const0_rtx);

This makes it required that the operand of an UNSPEC_TIE unspec is a
const_int 0.  This should be documented somewhere.  Ideally you would
want no operand at all here, but every unspec has an operand.

> +  RTVEC_ELT (p, i)
> + = gen_rtx_SET (mem, gen_rtx_UNSPEC (BLKmode, gen_rtvec (1, const0_rtx),
> + UNSPEC_TIE));

If it is hard to indent your code, your code is trying to do to much.
Just have an extra temporary?

  rtx un = gen_rtx_UNSPEC (BLKmode, gen_rtvec (1, const0_rtx), UNSPEC_TIE);
  RTVEC_ELT (p, i) = gen_rtx_SET (mem, un);

That is shorter even, and certainly more readable :-)

> @@ -10828,7 +10829,9 @@ (define_expand "restore_stack_block"
>operands[4] = gen_frame_mem (Pmode, operands[1]);
>p = rtvec_alloc (1);
>RTVEC_ELT (p, 0) = gen_rtx_SET (gen_frame_mem (BLKmode, operands[0]),
> -   const0_rtx);
> +   gen_rtx_UNSPEC (BLKmode,
> +   gen_rtvec (1, const0_rtx),
> +   UNSPEC_TIE));
>operands[5] = gen_rtx_PARALLEL (VOIDmode, p);

I have a hard time to see how this could ever be seen as clearer or more
obvious or anything like that :-(


Segher

Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie

2023-06-13 Thread Segher Boessenkool

Hi!

On Tue, Jun 13, 2023 at 10:15:49AM +0800, Jiufu Guo wrote:
> David Edelsohn  writes:
> >
> > This definitely seems to be a better solution.
> >
> > The TARGET_CONST_ANCHOR change should not be part of this patch.  Also
> > there is no ChangeLog for the patch.
> 
> Thanks a lot for your quick review!! And sorry for the sending this patch
> in a hurry.  I would update the patch accordingly.

> > This generally looks correct and consistent with other ports. I want
> > to give Segher a chance to double check it, if he wishes.

The documentation is very clear that the only thing for which you can
have BLKmode is "mem".  Not unspec, only "mem".

Let's not do this.  The existing code has clear and obvious semantics,
which is documented as well -- there is no reason to make it worse in
every respect!

Segher

Re: [PATCH v2] rs6000: fmr gets used instead of faster xxlor [PR93571]

2023-06-12 Thread Segher Boessenkool

Hi!

On Sat, Feb 25, 2023 at 03:20:33PM +0530, Ajit Agarwal wrote:
> Here is the patch that uses xxlor instead of fmr where possible.
> Performance results shows that fmr is better in power9 and 
> power10 architectures whereas xxlor is better in power7 and
> power 8 architectures. fmr is the only option before p7.

>  ;; The ISA we implement.
> -(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9,p9v,p9kf,p9tf,p10"
> +(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p7p8v,p9,p9v,p9kf,p9tf,p10"
>(const_string "any"))

This isn't really about what insn we *can* use here.

> + (and (eq_attr "isa" "p7p8v")
> +   (match_test "TARGET_VSX && !TARGET_P9_VECTOR"))
> + (const_int 1)

What is needed here is test the *tune* setting.  For example if someone
uses -mcpu=power8 -mtune=power9 (this is a setting that is really used,
or was a few years ago anyway) you *do* want fmr insns generated.

So don't do this via the isa attribute at all, just add some insn
condition (testing the tune setting)?

Segher

Re: [PATCH V2] Optimize '(X - N * M) / N' to 'X / N - M' if valid

2023-06-09 Thread Segher Boessenkool

Hi!

On Wed, Jun 07, 2023 at 04:21:11PM +0800, Jiufu Guo wrote:
> This patch tries to optimize "(X - N * M) / N" to "X / N - M".
> For C code, "/" towards zero (trunc_div), and "X - N * M" maybe
> wrap/overflow/underflow. So, it is valid that "X - N * M" does
> not cross zero and does not wrap/overflow/underflow.

Is it ever valid semi-generally when N does not divide X?

Say X=5, N=2, M=3.  Then the original expression evaluates to 0, but the
new one to -1.  Whenever one of the divisions rounds up and the other
rounds down you have this problem.

Segher

[PATCH 2/2] rs6000: genfusion: Delete dead code

2023-06-06 Thread Segher Boessenkool

2023-06-06  Segher Boessenkool  

* config/rs6000/genfusion.pl: Delete some dead code.

---
 gcc/config/rs6000/genfusion.pl | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl
index 2851bb7..82e8f86 100755
--- a/gcc/config/rs6000/genfusion.pl
+++ b/gcc/config/rs6000/genfusion.pl
@@ -347,6 +347,3 @@ EOF
 gen_ld_cmpi_p10();
 gen_logical_addsubf();
 gen_addadd;
-
-exit(0);
-
-- 
1.8.3.1

[PATCH 1/2] rs6000: genfusion: Rewrite load/compare code

2023-06-06 Thread Segher Boessenkool

This makes the code more readable, more digestible, more maintainable,
more extensible.  That kind of thing.  It does that by pulling things
apart a bit, but also making what stays together more cohesive lumps.

The original function was a bunch of loops and early-outs, and then
quite a bit of stuff done per iteration, with the iterations essentially
independent of each other.  This patch moves the stuff done for one
iteration to a new _one function.

The second big thing is the stuff printed to the .md file is done in
"here documents" now, which is a lot more readable than having to quote
and escape and double-escape pieces of text.  Whitespace inside the
here-document is significant (will be printed as-is), which is a bit
awkward sometimes, or might take some getting used to, but it is also
one of the benefits of using them.

Local variables are declared at first use (or close to first use).
There also shouldn't be many at all, often you can write easier to
read and manage code by omitting to name something that is hard to name
in the first place.

Finally some things are done in more typical, more modern, and tighter
Perl style, for example REs in "if"s or "qw" for lists of constants.


2023-06-06  Segher Boessenkool  

* config/rs6000/genfusion.pl (gen_ld_cmpi_p10_one): New, rewritten and
split out from...
(gen_ld_cmpi_p10): ... this.

---
 gcc/config/rs6000/genfusion.pl | 185 +++--
 1 file changed, 103 insertions(+), 82 deletions(-)

diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl
index e4db352..2851bb7 100755
--- a/gcc/config/rs6000/genfusion.pl
+++ b/gcc/config/rs6000/genfusion.pl
@@ -53,92 +53,113 @@ sub mode_to_ldst_char
 return '?';
 }
 
+sub gen_ld_cmpi_p10_one
+{
+  my ($lmode, $result, $ccmode) = @_;
+
+  my $np = "NON_PREFIXED_D";
+  my $mempred = "non_update_memory_operand";
+  my $extend;
+
+  if ($ccmode eq "CC") {
+# ld and lwa are both DS-FORM.
+($lmode =~ /^[SD]I$/) and $np = "NON_PREFIXED_DS";
+($lmode =~ /^[SD]I$/) and $mempred = "ds_form_mem_operand";
+  } else {
+if ($lmode eq "DI") {
+  # ld is DS-form, but lwz is not.
+  $np = "NON_PREFIXED_DS";
+  $mempred = "ds_form_mem_operand";
+}
+  }
+
+  my $cmpl = ($ccmode eq "CC") ? "" : "l";
+  my $echr = ($ccmode eq "CC") ? "a" : "z";
+  if ($lmode eq "DI") { $echr = ""; }
+  my $constpred = ($ccmode eq "CC") ? "const_m1_to_1_operand"
+   : "const_0_to_1_operand";
+
+  # For clobber, we need a SI/DI reg in case we
+  # split because we have to sign/zero extend.
+  my $clobbermode = ($lmode =~ /^[QH]I$/) ? "GPR" : $lmode;
+  if ($result =~ /^EXT/ || $result eq "GPR" || $clobbermode eq "GPR") {
+# We always need extension if result > lmode.
+$extend = ($ccmode eq "CC") ? "sign" : "zero";
+  } else {
+# Result of SI/DI does not need sign extension.
+$extend = "none";
+  }
+
+  my $ldst = mode_to_ldst_char($lmode);
+  print < lmode.
- if ( $ccmode eq 'CC' ) {
- $extend = "sign";
- } else {
- $extend = "zero";
- }
- } else {
- # Result of SI/DI does not need sign extension.
- $extend = "none";
- }
- print ";; load-cmpi fusion pattern generated by gen_ld_cmpi_p10\n";
- print ";; load mode is $lmode result mode is $result compare mode is 
$ccmode extend is $extend\n";
+  foreach my $lmode (qw/DI SI HI QI/) {
+foreach my $result ("clobber", $lmode,  "EXT$lmode") {
+  # EXTDI does not exist, and we cannot directly produce HI/QI results.
+  next if $result =~ /^(QI|HI|EXTDI)$/;
 
- print "(define_insn_and_split 
\"*l${ldst}${echr}_cmp${cmpl}di_cr0_${lmode}_${result}_${ccmode}_${extend}\"\n";
- print "  [(set (match_operand:${ccmode} 2 \"cc_reg_operand\" 
\"=x\")\n";
- print "(compare:${ccmode} (match_operand:${lmode} 1 
\"${mempred}\" \"m\")\n";
- if ($ccmode eq 'CCUNS') { print "   "; }
- print "(match_operand:${lmode} 3 \"${constpred}\" 
\"n\")))\n";
- if ($result eq 'clobber') {
- print "   (clobber (match_scratch:${clobbermode} 0 \"=r\"))]\n";
- } elsif ($result eq $lmode) {
- print "   (set (match_operand:${result} 0 \"gpc_reg_operand\" 
\"=r\") (match_dup 1))]\n";
- } else {
- p

Re: [PATCH] rs6000: Remove duplicate expression [PR106907]

2023-06-05 Thread Segher Boessenkool

Hi!

On Mon, Jun 05, 2023 at 12:11:42PM +0530, P Jeevitha wrote:
> PR106907 has few warnings spotted from cppcheck. In that addressing duplicate
> expression issue here. Here the same expression is used twice in logical
> AND(&&) operation which result in same result so removing that.
> 
> 2023-06-05  Jeevitha Palanisamy  
> 
> gcc/
>   PR target/106907
>   * config/rs6000/rs6000.cc (vec_const_128bit_to_bytes): Remove
>   duplicate expression.
> 
> 
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 42f49e4a56b..d197c3f3289 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -28784,7 +28784,6 @@ vec_const_128bit_to_bytes (rtx op,
>  
>info->all_words_same
>  = (info->words[0] == info->words[1]
> -   && info->words[0] == info->words[1]
> && info->words[0] == info->words[2]
> && info->words[0] == info->words[3]);

Thanks!  Okay for trunk.  Also okay for all backports, no need to wait
if unexpected problems in trunk show up.  But still, backport to 13
first, then 12, then 11, only stop when it stops applying (or there are
no open release branches left) :-)


Segher

Re: [PATCH V5, 2/2] PR target/105325: Fix memory constraints for power10 fusion.

2023-05-26 Thread Segher Boessenkool

On Wed, May 10, 2023 at 11:40:00AM -0400, Michael Meissner wrote:
> This patch applies stricter predicates and constraints for LD and LWA
> instructions with power10 fusion.  These instructions are DS-form 
> instructions,
> which means that the bottom 2 bits of the address must be 0.

The low two bits of the offset, yes.

> --- a/gcc/config/rs6000/genfusion.pl
> +++ b/gcc/config/rs6000/genfusion.pl
> @@ -129,6 +129,12 @@ sub print_ld_cmpi_p10
>print "  \"\"\n";
>print "  [(set_attr \"type\" \"fused_load_cmpi\")\n";
>print "   (set_attr \"cost\" \"8\")\n";
> +
> +  if ($extend eq "sign")
> +{
> +  print "   (set_attr \"sign_extend\" \"yes\")\n";
> +}

You never ever need backslashes like this in Perl code, btw.  For
example:
  print qq{   (set_attr "sign_extend" "yes")\n};
or
  print qq/   (set_attr "sign_extend" "yes")\n/;
or
print <<"HERE"
   (set_attr "sign_extend" "yes")
HERE
or millions of other ways, all of which are much nicer than cramped code
that tries to look like C (but has very different semantics in all ways
that matter).  (Also zillions of ways that are worse still, but that is
the price of freedom maybe :-) )

> -  # Memory predicate to use.
> +  # Memory predicate to use.  For LWA, use the special LWA_OPERAND.

Explain *why*?  It is obvious *what*!

Maybe just split the series into more patches?
> @@ -0,0 +1,26 @@
> +/* { dg-do assemble } */
> +/* { dg-require-effective-target lp64 } */
> +/* { dg-require-effective-target power10_ok } */

power10_ok should no longer exist, btw.  Technical debt has to be
repaid :-/

This patch is readable btw.  Thanks :-)

Segher

Re: [PATCH V5, 1/2] PR target/105325: Rewrite genfusion.pl's gen_ld_cmpi_p10 function.

2023-05-26 Thread Segher Boessenkool

Hi Mike,

On Wed, May 10, 2023 at 11:38:55AM -0400, Michael Meissner wrote:
> This patch rewrites the gen_ld_cmpi_p10 function in genfusion.pl to be 
> clearer.

That is not at all what I asked for, even if I would agree the code is
nicer to read now (I don't).

What I asked for, what is needed, is for your patches to be readable.
This is a prerequisite for them to be reviewable, which is a
prerequisite for them to be approvable.  One way to do that is to split
out refactorings (which I asked for) and rewrites (which you did) to
earlier patches in the series.  Pure refactoring are easy to review:
they change exactly nothing in what code is executed.  Rewrites are much
harder to review.  But even then we can hope you didn't slip up once in
a hundred lines of code, sure.

The later patches can then be much more readable because there isn't so
much noise mixed in.

> Assuming I can check this in, I will
> also commit to the active GCC branches after a burn-in period.

No, you will never do that.  You always need approval for that.  We have
these procedures for a reason.  We do not want other things than what
was approved committed, doubly so if *nothing* was approved.

>   * config/rs6000/genfusion.pl (mode_to_ldst_char): Delete.

This is a regression.

> +# Print the insns for load and compare with -1/0/1.
> +# Arguments:
> +# lmode  -- Integer mode ("DI", "SI", "HI", or "QI").
> +# result -- "clobber", "GPR", or $lmode
> +# ccmode -- Sign vs. unsigned ("CC" or "CCUNS").
> +# mem_format -- Memory format ("d" or "ds").
> +# cmpl   -- Suffix for compare ("l" or "")
> +# const_pred -- Predicate for constant (i.e. -1/0/1 or 0/1).
> +# extend -- "sign", "zero", or "none".
> +# echr   -- Suffix for load ("a", "z", or "").
> +# load   -- Load instruction (i.e. "ld", "lwa", "lwz", etc.)
> +# np -- enum non_prefixed_form for memory type
> +# constraint -- constraint to use
> +# mem_pred   -- predicate for the memory operation

If you need a huge block comment for your sub argument, that is a
not-so-subtle hint that you need to refactor.  Or if this was supposed
to be a refactoring, that something went terribly wrong :-(

Segher

Re: [PATCH] Only use NO_REGS in cost calculation when !hard_regno_mode_ok for GENERAL_REGS and mode.

2023-05-25 Thread Segher Boessenkool

On Thu, May 25, 2023 at 10:29:47AM -0400, Vladimir Makarov wrote:
> 
> On 5/17/23 02:57, liuhongt wrote:
> >r14-172-g0368d169492017 replaces GENERAL_REGS with NO_REGS in cost
> >calculation when the preferred register class are not known yet.
> >It regressed powerpc PR109610 and PR109858, it looks too aggressive to use
> >NO_REGS when mode can be allocated with GENERAL_REGS.
> >The patch takes a step back, still use GENERAL_REGS when
> >hard_regno_mode_ok for mode and GENERAL_REGS, otherwise uses NO_REGS.
> >Kewen confirmed the patch fixed PR109858, I vefiried it also fixed 
> >PR109610.
> >
> >Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> >No big performance impact for SPEC2017 on icelake server.
> >Ok for trunk?
> >
> >gcc/ChangeLog:
> >
> > * ira-costs.cc (scan_one_insn): Only use NO_REGS in cost
> > calculation when !hard_regno_mode_ok for GENERAL_REGS and
> > mode, otherwise still use GENERAL_REGS.
> 
> Thank you for the patch.  It looks good for me.  It is ok to commit it 
> into the trunk.

Thanks everyone involved for fixing this nasty regression!  Much
appreciated.


Segher

Re: [PATCH] [testsuite] [powerpc] adjust -m32 counts for fold-vec-extract*

2023-05-25 Thread Segher Boessenkool

Hi Alex,

On Thu, May 25, 2023 at 10:55:37AM -0300, Alexandre Oliva wrote:
> On May 25, 2023, Segher Boessenkool  wrote:
> > Fwiw, updating the insn counts blindly like this
> 
> ... is a claim that carries a wildly incorrect and insulting underlying
> assumption:

Sorry you feel that way.  I'm not even assuming anything :-(

> I've actually identified the corresponding change to the
> lp64 tests, compared the effects of the codegen changes, and concluded
> the tests needed this changing for ilp32 to keep on testing for the same
> thing after code changes brought about by changes that AFAICT had been
> well understood when making the lp64 adjustments.

But you didn't explain any of that (saying it is so is not the same
thing at all as explaining it!)

> > If it is not possible to keep these tests up-to-date easily
> 
> The counts have been stable for a couple of release cycles already.
> 
> The change that caused the codegen differences is identified and
> understood; the PR confirmed my findings, naming the root cause and the
> incomplete testsuite adjustment.

Oh, was this discussed in some PR?  The patch submission should have
carried the conclusions from the discussions there then :-)

> I suspect there may also be ABI-related assumptions implied by the 'add'
> counts, but I don't know enough about all the ppc variants to be sure.

The compiler can and will create all kinds of code for wildly unexpected
reasons.  "add" is dangerous to count already, but it is not as bad as
"addi" :-)

> Now, if your implied claim is correct that counting 'add/addi'
> instructions in these tests is fragile, dropping the checks for those
> would probably be best.

The same is true for almost all instructions.  You can only sanely count
instructions if either you count only unusual insns, or if you test only
*tiny* functions (say five insns, including the blr at the end!)

> But if ppc maintainers seem to have different
> opinions as to how to deal with the fallout of that one-time codegen
> change, it would be foolish for me to get pulled into the cross fire.

There is no crossfire.  I did not dis-approve the patch, just said this
is a high maintenance direction to proceed in.  There has been a lot of
that the last few years, we should improve on that.  It is not about
this patch (only).

> Here's the patch that corrects the long-broken counts, with the
> requested adjustments, retested with ppc- and ppc64-vx7r2.  Ok?

> Codegen changes caused add instruction count mismatches on
> ppc-*-linux-gnu and other 32-bit ppc targets.  At some point the
> expected counts were adjusted for lp64, but ilp32 differences
> remained, and published test results confirm it.

... and this is not something that can be confimed like this.  Just
spend a few minutes more to put *actual numbers* here, with some
indication this is good and correct codegen, so that it is bloody easy
for a reviewer to review and for a maintainer to approve!

>  /* -m32 target has an 'add' in place of one of the 'addi'. */
> -/* { dg-final { scan-assembler-times {\maddi\M|\madd\M} 2 { target lp64 } } 
> } */
> -/* { dg-final { scan-assembler-times {\maddi\M|\madd\M} 3 { target ilp32 } } 
> } */
> +/* { dg-final { scan-assembler-times {\maddi\M|\madd\M} 2 } } */

Just {\madd} or more conservative {\maddi?\M} then?

Segher

Re: [PATCH] [testsuite] [powerpc] adjust -m32 counts for fold-vec-extract*

2023-05-25 Thread Segher Boessenkool

Hi!

On Thu, May 25, 2023 at 07:05:55AM -0300, Alexandre Oliva wrote:
> On May 25, 2023, "Kewen.Lin"  wrote:
> > So both lp64 and ilp32 have the same count, could we merge it and
> > remove the selectors?
> 
> We could, but...  I thought I wouldn't, since they were different
> before, and they're likely to diverge again in the future.  I thought
> that combining them might suggest that they ought to be the same, when
> we already know that this is not the case.
> 
> I'll prepare an alternate patch that combines them.

Fwiw, updating the insn counts blindly like this has very small value on
the one hand, and negative value on the other.  In total, negative
value.

If it is not possible to keep these tests up-to-date easily the test
should be improved.  If tests regressed otoh we should ***not*** paper
over that with patches like this, but investigate what happened instead:
such regressions are *real*.

So which is it here?  I am assuming it is a not-to-well written testcase
without all the necessary noipa attrs, and/or putting more than one
thing to test per function directly.  Insn counts then shift easily if
the compiler decides to factor (CSE etc.) your code differently, but
that is a testcase artifact then, not something we want to adjust counts
for all of the time.

It is feasible to do these insn count things only for trivial tiny
snippets.  Everything bigger will regress all of the time, no one will
look at it properly, and instead people will just do blind "update
counts" patches like this :-/  *Good* insn count tests are quite
valuable, but harder to write.  But maintenance costs noticably bigger
than zero for a testcase are not good, how many testcases do we run in
the testsuite?

So, can we fix the underlying problem here please?

Thanks,

Segher

Re: [PATCH v1] tree-ssa-sink: Improve code sinking pass.

2023-05-18 Thread Segher Boessenkool

Hi!

On Thu, May 18, 2023 at 12:44:28PM +0530, Ajit Agarwal wrote:
> This patch improves code sinking pass to sink statements before call to reduce
> register pressure.

An example would be useful :-)

>   * tree-ssa-sink.cc (statement_sink_location): Modifed to
>   move statements before calls.

Spello ("modified").  But, you should write in the imperative mood
anyway, so "modify".  But, every change is a modification, so do without
the fluff altogether?  "Move statements before calls."

>   (block_call_p): New function.
>   (def_use_same_block): New function.
>   (select_best_block): Add heuristics to select the best
>   blocks in the immediate post dominator.

Please don't break lines
early
it makes things
harder to
read.

:-)

> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */

This is the default, you can just leave it out.

> +/* { dg-options "-O2 -fdump-tree-sink -fdump-tree-optimized 
> -fdump-tree-sink-stats" } */

You don't need -fdump-tree-sink without options since you have
-fdump-tree-sink-stats as well.

> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */ 
> +/* { dg-options "-O2 -fdump-tree-sink-stats -fdump-tree-sink-stats" } */

You don't need to say it twice either :-)

> +/* Return TRUE if immediate uses of the defs in
> +   USE occur in the same block as USE, FALSE otherwise.  */
> +
> +bool
> +def_use_same_block (gimple *stmt)
> +{

There is no function parameter "use" here?  STMT instead?

> +  use_operand_p use_p;
> +  def_operand_p def_p;

Neither of these is a predicate.  Lose the _p please?

> +   if (use_p
> +   && (gimple_bb (USE_STMT (use_p)) == gimple_bb (stmt)))

Please fit this on one line.  And no parens around random things please.

> +/* Return TRUE if the block has only calls, FALSE otherwise. */
> +
> +bool
> +block_call_p (basic_block bb)

> +/* We have already seen a call.  */
> +if (is_call)
> +  return false;
> +
> +if (is_gimple_call (stmt))
> +  is_call = true;
> +else
> +  return false;

> +  if (is_call && i == 1)
> +return true;
> +
> +  return false;

This doesn't do what the function comment says?  It is very important
that function comments say exactly what a function does.  It can perhaps
leave out some details, but it should be correct by and large.

> + /* Update sinking point as stmt before call if the sinking block
> +has only calls. Otherwise update sinking point as the use
> +stmt. */

(two spaces after full stop, twice)

> + if (gsi_stmt (gsi) == use
> + && !is_gimple_call (last_stmt)
> + && (gimple_code (last_stmt) != GIMPLE_SWITCH)
> + && (gimple_code (last_stmt) != GIMPLE_COND)
> + && (gimple_code (last_stmt) != GIMPLE_GOTO)
> + && (!gimple_vdef (use) || !def_use_same_block (def_stmt)))

Please no unnecessary parens.  At first I didn't notice the last line
here *does* need it!


Segher

Re: [PATCH v5 1/4] rs6000: Enable REE pass by default

2023-05-16 Thread Segher Boessenkool

Hi!

On Tue, May 16, 2023 at 11:45:28AM +0530, Ajit Agarwal wrote:
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -12455,8 +12455,8 @@ Attempt to remove redundant extension instructions.  
> This is especially
>  helpful for the x86-64 architecture, which implicitly zero-extends in 64-bit
>  registers after writing to their lower 32-bit half.
>  
> -Enabled for Alpha, AArch64 and x86 at levels @option{-O2},
> -@option{-O3}, @option{-Os}.
> +Enabled for Alpha, AArch64, RS/6000, RISC-V, SPARC, h83000 and x86 at levels 
> +@option{-O2}, @option{-O3}, @option{-Os}.

Please don't mention RS/6000, we don't support that anymore.  The
architecture we do support is called Power or PowerPC; the target
triplets are powerpc*-*-*.  rs6000-*-* might still somewhat work, but
no one should use it anymore, and we probably should delete it.

Please say PowerPC here.

With that the patch is okay for trunk.  Thank you!

Segher

Re: [PATCH] [powerpc] Add a peephole2 to eliminate redundant move from VSX_REGS to GENERAL_REGS when it's from memory.

2023-05-15 Thread Segher Boessenkool

On Thu, May 04, 2023 at 01:54:46PM +0800, liuhongt wrote:
> r14-172-g0368d169492017 use NO_REGS instead of GENERAL_REGS in memory cost
> calculation when preferred register class is unkown.
> +  /* Costs for NO_REGS are used in cost calculation on the
> +1st pass when the preferred register classes are not
> +known yet.  In this case we take the best scenario.  */
> 
> It regressed gcc.target/powerpc/dform-3.c which has inline asm explicitly
> put a vector mode into a general register, then create an extra move.
> RA doesn't allocate GENERAL_REGS for it because the backend pattern
> explicitly disparage the alternative (, r), (??r, Y) which moves
> from GENERAL_REGS/MEM to GENERAL_REGS.

And that is correct and wanted.

> Normally the extra move can be eliminated by pass_reload

Which is completely beside the point: reload is not in the business of
making good code.  Instead, it should make reasonable code when the
good code we attempted to make did not work out.  Think of it like a
last resort register allocation fixup.

> The patch adds a peephole2 to eliminate the extra move.

Nope.  Not okay.  This is not what peepholes are for *at all*, and this
path leads to insanity and millionfold maintenance cost.

Peepholes are *only*, I repeat *only*, for situations where we did a
*correct* cost estimation but due to some target detail we want to
fine-tune the generated code a bit.

If you want a peephole you most likely have a fundamental cost function
problem elsewhere.  Fix *that*, that is actually useful, and won't get
you into hotter water.

> Ok for trunk?

Not okay, no.  Please fix the actual bug?  Revert the previous patch,
for example :-(

> +;; Peephole to catch memory loads to VSX_REG and then moves to GENERAL_REGS.
> +(define_peephole2
> +  [(set (match_operand:VSX_M 0 "vsx_register_operand")
> + (match_operand:VSX_M 1 "memory_operand"))
> +   (set (match_operand:VSX_M 2 "int_reg_operand")
> + (match_dup 0))]
> +  "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (mode)
> +  && peep2_reg_dead_p (2, operands[0])"
> +  [(set (match_dup 2) (match_dup 1))])

The condition does not make sense, even assuming the peephole does (it
does not).  Why would you care if the compiler is allowed to generate
64-bit insns here?

The formatting is messed up as well.

Segher

Re: [PATCH, V4] PR target/105325, Make load/cmp fusion know about prefixed loads.

2023-05-02 Thread Segher Boessenkool

On Wed, Apr 26, 2023 at 12:18:36PM -0400, Michael Meissner wrote:
>   * gcc/config/rs6000/genfusion.pl (gen_ld_cmpi_p10): Improve generation
>   of the ld and lwa instructions which use the DS encoding instead of D.
>   Use the YZ constraint for these loads.  Handle prefixed loads better.

Don't use tabs in the middle of a line.

"Handle prefixed loads better" is not what the patch does, and/or is so
vague as to be useless.

> --- a/gcc/config/rs6000/genfusion.pl
> +++ b/gcc/config/rs6000/genfusion.pl
> @@ -56,7 +56,7 @@ sub mode_to_ldst_char
>  sub gen_ld_cmpi_p10
>  {
>  my ($lmode, $ldst, $clobbermode, $result, $cmpl, $echr, $constpred,
> - $mempred, $ccmode, $np, $extend, $resultmode);
> + $mempred, $ccmode, $np, $extend, $resultmode, $constraint);
>LMODE: foreach $lmode ('DI','SI','HI','QI') {
>$ldst = mode_to_ldst_char($lmode);
>$clobbermode = $lmode;
> @@ -71,21 +71,34 @@ sub gen_ld_cmpi_p10
>CCMODE: foreach $ccmode ('CC','CCUNS') {
> $np = "NON_PREFIXED_D";
> $mempred = "non_update_memory_operand";
> +   $constraint = "m";
> if ( $ccmode eq 'CC' ) {
> next CCMODE if $lmode eq 'QI';
> -   if ( $lmode eq 'DI' || $lmode eq 'SI' ) {
> +   if ( $lmode eq 'HI' ) {
> +   $np = "NON_PREFIXED_D";
> +   $mempred = "non_update_memory_operand";
> +   $echr = "a";
> +   } elsif ( $lmode eq 'SI' ) {
> +   # ld and lwa are both DS-FORM.
> +   $np = "NON_PREFIXED_DS";
> +   $mempred = "lwa_operand";
> +   $echr = "a";
> +   $constraint = "YZ";
> +   } elsif ( $lmode eq 'DI' ) {
> # ld and lwa are both DS-FORM.
> $np = "NON_PREFIXED_DS";
> $mempred = "ds_form_mem_operand";
> +   $echr = "";
> +   $constraint = "YZ";
> }
> $cmpl = "";
> -   $echr = "a";
> $constpred = "const_m1_to_1_operand";
> } else {
> if ( $lmode eq 'DI' ) {
> # ld is DS-form, but lwz is not.
> $np = "NON_PREFIXED_DS";
> $mempred = "ds_form_mem_operand";
> +   $constraint = "YZ";
> }
> $cmpl = "l";
> $echr = "z";
> @@ -108,7 +121,7 @@ sub gen_ld_cmpi_p10
>  
> print "(define_insn_and_split 
> \"*l${ldst}${echr}_cmp${cmpl}di_cr0_${lmode}_${result}_${ccmode}_${extend}\"\n";
> print "  [(set (match_operand:${ccmode} 2 \"cc_reg_operand\" 
> \"=x\")\n";
> -   print "(compare:${ccmode} (match_operand:${lmode} 1 
> \"${mempred}\" \"m\")\n";
> +   print "(compare:${ccmode} (match_operand:${lmode} 1 
> \"${mempred}\" \"${constraint}\")\n";
> if ($ccmode eq 'CCUNS') { print "   "; }
> print "(match_operand:${lmode} 3 \"${constpred}\" 
> \"n\")))\n";
> if ($result eq 'clobber') {
> @@ -137,6 +150,11 @@ sub gen_ld_cmpi_p10
> print "  \"\"\n";
> print "  [(set_attr \"type\" \"fused_load_cmpi\")\n";
> print "   (set_attr \"cost\" \"8\")\n";
> +
> +   if ($extend eq "sign") {
> +   print "   (set_attr \"sign_extend\" \"yes\")\n";
> +   }
> +
> print "   (set_attr \"length\" \"8\")])\n";
> print "\n";
>}

This already was a 90-line function that did too many things.  Now it is
bigger and does more things, and the patch is unintelligible.

Please first factor things.  There are many more things terrible Perl
code style here (like all of the quoting), but where to start :-/

I once again spent many hours trying to review this, and once again
failed.  Please write better code, and please make better patches.

> index ec783803820..7d6c94aee5b 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -302,7 +302,7 @@ (define_attr "prefixed" "no,yes"
> (eq_attr "maybe_prefixed" "no"))
>(const_string "no")
>  
> -  (eq_attr "type" "load,fpload,vecload")
> +  (eq_attr "type" "load,fpload,vecload,vecload,fused_load_cmpi")

Don't duplicate vecload.

> --- /dev/null
> +++ b/gcc/testsuite/g++.target/powerpc/pr105325.C
> @@ -0,0 +1,25 @@
> +/* { dg-do assemble } */
> +/* { dg-require-effective-target lp64 } */
> +/* { dg-require-effective-target power10_ok } */
> +/* { dg-require-effective-target powerpc_prefixed_addr } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power10 -fstack-protector" } */

The power10_ok selector still is terribly broken (it allows only some
variants of 64-bit Linux and nothing more, to start with).  Do we still
need it in any case?

Same for powerpc_prefixed_addr.  Is there any supported target that does
not have a working assembler?

What is -fstack-protector here for?  That should be documented, or
better, it should just be removed if possible.

> -/* { dg-final { scan-assembler-times

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 6040 matches

Mail list logo