Re: [PATCH] testsuite: Fix pr66144-3.c test to accept multiple equivalent insns. [PR115262]
On Wed, Jun 12, 2024 at 07:02:31PM -0500, Peter Bergner wrote: > On 6/12/24 3:00 PM, Segher Boessenkool wrote: > >> /* { dg-do compile { target { powerpc64*-*-* } } } */ > > > > Probably should be an "lp64" instead? > > Actually, there is nothing inherently 64-bit about the test case. > I removed the target test altogether and it executes just fine on > our BE system in both 32-bit and 64-bit modes, so I'll just drop > the target test as part of the patch. Ha, even better! > >> /* { dg-require-effective-target powerpc_vsx } */ > > > > This isn't needed either. > > Maybe not strictly needed, but it shields us from users who force > some options to be used via RUNTESTFLAGS env var that can cause the > test case to FAIL. I'm going to leave this for someone else to > clean up. Users can make most tests fail in interesting and exciting ways like that, heh. In general, only realistic settings are supported: things for which hardware exists, an OS exists for, etc. With any other settings many things can fail, and that is Just Fine. Thanks again, Segher
Re: [PATCH] testsuite: Fix pr66144-3.c test to accept multiple equivalent insns. [PR115262]
Hi! On Wed, Jun 12, 2024 at 02:49:40PM -0500, Peter Bergner wrote: > testsuite: Fix pr66144-3.c test to accept multiple equivalent insns. > [PR115262] ("rs6000:", not "testsuite") > Jeff's commit r15-831-g05daf617ea22e1 changed the instruction we expected > for this test case into an equivalent instruction. Modify the test case > so it will accept any of three equivalent instructions we could get depending > on the options used. They aren't equivalent insns, but they are all simple insns, and all just as cheap :-) > I've verified this test case PASSes on all scenarios where the three possible > instructions are generated. Ok for trunk? > --- a/gcc/testsuite/gcc.target/powerpc/pr66144-3.c > +++ b/gcc/testsuite/gcc.target/powerpc/pr66144-3.c > @@ -1,5 +1,5 @@ > /* { dg-do compile { target { powerpc64*-*-* } } } */ Probably should be an "lp64" instead? > -/* { dg-options "-mdejagnu-cpu=power8 -mvsx -O2 -ftree-vectorize" } */ > +/* { dg-options "-mdejagnu-cpu=power8 -mvsx -O2 -ftree-vectorize > -fno-unroll-loops" } */ The "-mvsx" is superfluous btw (implied by -mcpu=power8). > /* { dg-require-effective-target powerpc_vsx } */ This isn't needed either. > -/* { dg-final { scan-assembler {\mvcmpequw\M} } } */ > -/* { dg-final { scan-assembler {\mxxsel\M}} } */ > +/* { dg-final { scan-assembler-times {\mvcmpequw\M} 1 } } */ > +/* { dg-final { scan-assembler-times {\m(?:xxsel|xxlor|vor)\M} 1 } } */ > /* { dg-final { scan-assembler-not {\mvspltisw\M} } } */ > /* { dg-final { scan-assembler-not {\mxxlorc\M} } } */ Okido, thanks! The three trivial tweaks I suggest here are okay as well if you want, after testing of course ;-) Segher
Re: [PATCH v2] fix PowerPC < 7 w/ Altivec not to default to power7
Hi! What does "powerpc < 7" mean? Something before POWER ISA 2.06? On Tue, Jun 11, 2024 at 04:22:54PM +0200, Rene Rebe wrote: > Glibc uses .machine to determine assembler optimizations to use. What does this mean? .machine is an *output* for glibc; nothing in glibc reads source code. Nothing the ".machine" directive does has anything to do with optimisations. Instead, it simply changes what architecture level is used for the following code. what specific instructions are supported mainly. > --- a/gcc/testsuite/gcc.target/powerpc/pr97367.c.vanilla 2024-05-30 > 18:26:29.839784279 +0200 > +++ b/gcc/testsuite/gcc.target/powerpc/pr97367.c 2024-10-06 > 18:20:34.873818482 +0200 > @@ -0,0 +1,9 @@ > +/* { dg-do compile } */ > +/* { dg-options "-mdejagnu-cpu=G5" } */ > + > +int dummy () > +{ > + return 0; > +} > + > +/* { dg-final { scan-assembler "power4" } } */ Please explain (in the testcase, not here!) what this is meant to test! You probably want to say {\mpower4\M} instead, btw. Unless you want to match ".machine spower436" as well? Segher
Re: [PATCH V4 1/2] split complicate 64bit constant to memory
Hi! On Tue, Jun 11, 2024 at 04:37:25PM +0800, Jiufu Guo wrote: > Sometimes, a complicated constant is built via 3(or more) > instructions. Generally speaking, it would not be as fast > as loading it from the constant pool (as the discussions in > PR63281): > "ld" is one instruction. If consider "address/toc" adjust, > we may count it as 2 instructions. And "pld" may need fewer > cycles. Yeah, three insns to build up a constant always was about as fast/slow as loading a constant from memory. When you need two related constants loading from memory is slower, so we preferred building them up. > As testing(SPEC2017), it could get better/stable runtime > if set the threshold as "> 2" (compare with "> 3"). Interesting! About how much speedup did you see? 0.1%? > As known, because the constant is load from memory by this > patch, so this functionality may affect the cache missing. > Also there may be possible side effects on icach. While, > IMHO, this patch would be still do the right thing. Yeah, but almost every codegen patch has *some* icache effect. Your benchmarks show a net positive effect, that is all that matters in the end. > gcc/ChangeLog: > > * config/rs6000/rs6000.cc (rs6000_emit_set_const): Update to split > complicate constant to memory. (complicated) And don't write "update", every patch is an update :-) There are more conditions here, mention those as well? But, why does base_reg_operand matter? And, why for TARGET_ELF only? That is the only place you tested I'm sure, but make an educated guess for the rest, why would it not be useful for other binary formats? > diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc > index e4dc629ddcc..f448df289a0 100644 > --- a/gcc/config/rs6000/rs6000.cc > +++ b/gcc/config/rs6000/rs6000.cc > @@ -10240,6 +10240,21 @@ rs6000_emit_set_const (rtx dest, rtx source) > c = sext_hwi (c, 32); > emit_move_insn (lo, GEN_INT (c)); > } > + > + else if (base_reg_operand (dest, mode) && TARGET_64BIT > +&& TARGET_ELF && num_insns_constant (source, mode) > 2) > + { > + rtx sym = force_const_mem (mode, source); > + if (TARGET_TOC && SYMBOL_REF_P (XEXP (sym, 0)) > + && use_toc_relative_ref (XEXP (sym, 0), mode)) > + { > + rtx toc = create_TOC_reference (XEXP (sym, 0), copy_rtx (dest)); > + sym = gen_const_mem (mode, toc); > + set_mem_alias_set (sym, get_TOC_alias_set ()); > + } > + > + emit_move_insn (dest, sym); > + } >else > rs6000_emit_set_long_const (dest, c); >break; Is there no utility function to put constants like this in memory? Like, "output_toc" for example? > diff --git a/gcc/testsuite/gcc.target/powerpc/const_anchors.c > b/gcc/testsuite/gcc.target/powerpc/const_anchors.c > index 542e2674b12..f33c9a83f5e 100644 > --- a/gcc/testsuite/gcc.target/powerpc/const_anchors.c > +++ b/gcc/testsuite/gcc.target/powerpc/const_anchors.c > @@ -1,5 +1,5 @@ > /* { dg-do compile { target has_arch_ppc64 } } */ > -/* { dg-options "-O2" } */ > +/* { dg-options "-O2 -fdump-rtl-final" } */ > > #define C1 0x2351847027482577ULL > #define C2 0x2351847027482578ULL > @@ -17,4 +17,5 @@ void __attribute__ ((noinline)) foo1 (long long *a, long > long b) > *a++ = C2; > } > > -/* { dg-final { scan-assembler-times {\maddi\M} 2 } } */ > +/* Checking "final" instead checking "asm" output to avoid noise. */ > +/* { dg-final { scan-rtl-dump-times {\madddi3\M} 2 "final" } } */ So, in the RTL dump it finds a named pattern adddi3, while in the asm you find random other addi insns? Hardly worth a comment in the testcase itself, but heh. The point is that you check for the RTL pattern name instead of the machine insn. The compiler pass you check in isn't changed even! > --- a/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c > +++ b/gcc/testsuite/gcc.target/powerpc/parall_5insn_const.c > @@ -6,11 +6,18 @@ > /* { dg-final { scan-assembler-times {\mori\M} 4 } } */ > /* { dg-final { scan-assembler-times {\mrldimi\M} 2 } } */ > > +/* The below macro helps to avoid loading constant from memory. */ > +#define CONST_AVOID_BASE_REG(DEST, CST) > \ > + { > \ > +register long long d asm ("r0") = CST; > \ > +asm volatile ("std %1, %0" : : "m"(DEST), "r"(d)); > \ > + } This needs an actual changelog, not just "update". Something as simple as "New macro to force constant into a reg" is fine, but more than "update" :-) > +/* The below macro helps to avoid loading constant from memory. */ > +#define CONST_AVOID_BASE_REG(DEST, CST) > \ > + { > \ > +register long long d asm ("r0") = CST;
Re: [PATCH] rs6000: Fix up PCH in --enable-host-pie builds [PR115324]
Hi! On Mon, Jun 03, 2024 at 04:55:05PM +0200, Jakub Jelinek wrote: > PCH doesn't work properly in --enable-host-pie configurations on > powerpc*-linux*. PCH and PIE, two of my favourites ;-) > For PCH though it actually results in saving those huge arrays (one is > 130832 bytes, another 81568 bytes) into the .gch files and loading them back > in full. While the bifname and attr_string and next pointers are marked as > GTY((skip)), they are actually saved to point to the .rodata and .data > sections of the process which writes the PCH, but because cc1/cc1plus etc. > are position independent executables with --enable-host-pie, when it is > loaded from the PCH file, it can point in a completely different addresses > where nothing is mapped at all or some random different thing appears at. Yuck. > So, either we'd need to add some further GTY extensions, or the following > patch instead reworks it such that the fntype members which were the only > reason for PCH in those arrays are moved to separate arrays. And that just sidesteps the limitation in PCH? > void > rs6000_init_generated_builtins () > { > + bifdata *rs6000_builtin_info_p; > + tree *rs6000_builtin_info_fntype_p; > + ovlddata *rs6000_instance_info_p; > + tree *rs6000_instance_info_fntype_p; > + ovldrecord *rs6000_overload_info_p; > + __asm ("" : "=r" (rs6000_builtin_info_p) : "0" (rs6000_builtin_info)); Bah. It should not be called _p of course, it is not a predicate. And relying on the operand tie to not have to do a much more obvious assignment, please don't. Just *do* write assignments, and then use a simple "+r"? But you call this a hack anyway, you wouldn't propose to actually include this patch :-) > Bootstrapped/regtested on powerpc64le-linux and powerpc64-linux (-m32/-m64 > testing there), ok for trunk and after a while for release branches? Yes please. What nastiness. Thanks for dealing with it! Segher > 2024-06-03 Jakub Jelinek > > PR target/115324 > * config/rs6000/rs6000-gen-builtins.cc (write_decls): Remove > GTY markup from struct bifdata and struct ovlddata and remove their > fntype members. Change next member in struct ovlddata and > first_instance member of struct ovldrecord to have int type rather > than struct ovlddata *. Remove GTY markup from rs6000_builtin_info > and rs6000_instance_info arrays, declare new > rs6000_builtin_info_fntype and rs6000_instance_info_fntype arrays, > which have GTY markup. > (write_bif_static_init): Adjust for the above changes. > (write_ovld_static_init): Likewise. > (write_init_bif_table): Likewise. > (write_init_ovld_table): Likewise. > * config/rs6000/rs6000-builtin.cc (rs6000_init_builtins): Likewise. > * config/rs6000/rs6000-c.cc (find_instance): Likewise. Make static. > (altivec_resolve_overloaded_builtin): Adjust for the above changes.
Re: [Patch, rs6000, aarch64, middle-end] Add implementation for different targets for pair mem fusion
On Fri, May 31, 2024 at 09:14:21AM +0100, Richard Sandiford wrote: > Segher Boessenkool writes: > > Hi! > > > > On Fri, May 31, 2024 at 01:21:44AM +0530, Ajit Agarwal wrote: > >> Code is implemented with pure virtual functions to interface with target > >> code. > > > > It's not a pure function. A pure function -- by definition -- has no > > side effects. These things have side effects. > > > > What you mean is this is *an implementation* for C++ functions without > > a generic implementation. An obfuscation some people (like me) would > > say. But please call things what they are! So not "pure function". > > That has a meaning, and this isn't it. > > "pure virtual function" is an established term. The "pure" modifies > "virtual", not "function". > > The description is correct because the patch adds pure virtual functions > to the base class and expects the derived class to override and implement > them. But this code -- the architecture implementation! -- certainly does *not* add abstract functions: it should provide an implementation for them, instead. So the commit message is completely misleading :-( And no, it is not an established term, not outside of the C++ world. Which GCC agreed *not* to dive too much into. You can only sanely write most of compilers -- just like anything where algorithmics matter -- in an imperative, procedural language. Obfuscation (like action at a distance like this, where the meaning of a program depends hugely on knowledge of other parts of the program, far away!) is the devil. Reading the program is at least 1000x more important than writing it. Writing is easy. Reading and understanding, not so much. > >>* config/aarch64/aarch64-ldp-fusion.cc: Add target specific > >>implementation of additional virtual functions added in pair_fusion > >>struct. > > > > This does not belong in this patch. Do not send "rs6000" patches that > > touch anything outside of config/rs6000/ and similar, certainly not in > > config/something-else/! > > > > This would be WAY easier to review (read: AT ALL POSSIBLE) if you > > included some detailed rationale and design document. > > Please don't shout. > > I don't think this kind of aggressive review is helpful to the project. And I don't think wasting people's time is helpful. I don't shout. If you read that as shouting, that's not my problem. It is EMPHASIS (there are no small caps in email). Do you prefer *fat print* for that? Or /slanted/? Or **italics**? _Underlined_ perhaps? Segher
Re: [Patch, rs6000, aarch64, middle-end] Add implementation for different targets for pair mem fusion
Hi! On Fri, May 31, 2024 at 01:21:44AM +0530, Ajit Agarwal wrote: > Code is implemented with pure virtual functions to interface with target > code. It's not a pure function. A pure function -- by definition -- has no side effects. These things have side effects. What you mean is this is *an implementation* for C++ functions without a generic implementation. An obfuscation some people (like me) would say. But please call things what they are! So not "pure function". That has a meaning, and this isn't it. > * config/aarch64/aarch64-ldp-fusion.cc: Add target specific > implementation of additional virtual functions added in pair_fusion > struct. This does not belong in this patch. Do not send "rs6000" patches that touch anything outside of config/rs6000/ and similar, certainly not in config/something-else/! This would be WAY easier to review (read: AT ALL POSSIBLE) if you included some detailed rationale and design document. Segher
Re: [PATCH v3 #2/2] [rs6000] adjust return_pc debug attrs
Hi Alex, On Thu, May 30, 2024 at 01:40:27PM -0300, Alexandre Oliva wrote: > Sorry, I misnumbered this patch as #1/2 when first posting v3. I see at least five completely different patches in this email thread, with different subjects and all. > On May 28, 2024, Segher Boessenkool wrote: > > > Please don't (incorrectly!) line-wrap changelogs. Lines are 80 > > characters wide, not 60 or 72 or whatever. 80. Indents are tabs that > > take 8 columns. > > ACK. When was it bumped up to 80, BTW? It wasn't always like that, was > it? It always was like that. Some people say 79, that is fine too. It mostly irks me because lines that end in : and then a lot of empty space look like peoople used one of those awful "write the changelog for me" helper things, that *at best* *slow you down*, and always (always!) cause worse results. > I've noticed that something seems to change my line width settings > in Emacs, but I must have missed that memo. Line length in source code is 79 or 80. In email it is 72 or so. This has not changed since the dawn of time :-) > >> +/* Return the offset to be added to the label output after CALL_INSN > >> + to compute the address to be placed in DW_AT_call_return_pc. */ > >> + > >> +static int > >> +rs6000_call_offset_return_label (rtx_insn *call_insn) > >> +{ > >> + /* All rs6000 CALL_INSN output patterns start with a b or bl, always > > > This isn't true. There is a crlogical insn before the bcl for sysv for > > example. > > Hmm, if that's so, this function will have to look at the insn and > recognize those cases and use a different way to compute the offset. > > However, I don't see any relevant uses of bcl in output patterns for > insns containing a call rtx. bl, bcl, what's the difference (bit 4 is, heh, 16 vs. 18). Read bl if you want -- the point is that there are crlogical insns before the branch-and-link. > >> +we compute the offset > >> + back to the address of the call opcode proper, then add the > >> + constant 4 bytes, to get the address after that opcode. */ > >> + return 4 - get_attr_length (call_insn); > > > Please explain this magic, too -- in code preferably (so with a ? : > > maybe, but don't try to "optimise" that expression, let the compiler > > do that, it is much better at it anyway :-) ) > > There's neither optimization nor magic, it's literally what's written in > the comment quoted above: starting from the label at the end of the call > insn (that's what the caller offsets from, as in the documentation in > the actual #1/2), subtract the length (to get to the address of the call > opcode), and add 4 (to get past the call opcode). Ok, I've reordered > the two addends for an IMHO more readable expression, but that was all. 4 - length does not make any sense /an sich/, it *is* magic. It is not clear it is correct at all, either. > > Is that correct for all ABIs we support? (Context missing! What did I ask?) > Back when I wrote this patchset, I went through all call opcodes I could > find in the md and .c files within rs6000/, and I was satisfied that it > covered what we had then, but I won't pretend to know all about all of > the ppc ABIs. I may have missed disguised call insns, too. I was > hoping some ppc maintainer, more familiar with the port than I am, would > catch any trouble on review and let me know about pitfalls and surprises > to watch out for. Yeah, things don't work that way. If you need help, *ask* for that. Don't pretend a patch is good if you have doubts yourself! > > Even if so, it needs a lot more documentation than this. > > I can write more documentation, but I'm at a loss as to what you're > hoping for. If you set clearer expectations, I'll be glad to oblige. I want a patch submission that is a) understandable and b) a good thing to have. If a patch leaves me wondering what is going on at all, that is not ideal ;-) Segher
Re: [PATCH v3 #1/2] [rs6000] adjust return_pc debug attrs
On Wed, May 29, 2024 at 03:52:15AM -0300, Alexandre Oliva wrote: > On May 27, 2024, "Kewen.Lin" wrote: > > > I wonder if it's possible to have a test case for this? > > gcc.dg/guality/pr54519-[34].c at -O[1g] are fixed by this patch on > ppc64le-linux-gnu. Are these the sort of test case you're interested > in, or are you looking for something that tests the offsets in debug > info, rather than the end-to-end debugging feature? A testcase specifically for this would be good, yes. Where you can say at the top of the file "This tests that ... [X and Y]" :-) Segher
Re: [PATCHv3] Optab: add isfinite_optab for __builtin_isfinite
Hi! On Mon, May 27, 2024 at 05:37:23PM +0800, HAO CHEN GUI wrote: > --- a/gcc/builtins.cc > +++ b/gcc/builtins.cc > @@ -2459,8 +2459,9 @@ interclass_mathfn_icode (tree arg, tree fndecl) >errno_set = true; builtin_optab = ilogb_optab; break; > CASE_FLT_FN (BUILT_IN_ISINF): >builtin_optab = isinf_optab; break; > -case BUILT_IN_ISNORMAL: > case BUILT_IN_ISFINITE: > + builtin_optab = isfinite_optab; break; This needs a line break after the first ; (like after *any* semicolon in C). It is rather important that every "break;" stands out :-) > +@cindex @code{isfinite@var{m}2} instruction pattern > +@item @samp{isfinite@var{m}2} > +Set operand 0 to nonzero if operand 1 is a finite @code{SFmode}, > +@code{DFmode}, or @code{TFmode} floating point number and to 0 > +otherwise. operand 0 is the output of the builtin, right? So write that instead? "Return 1 if the operand (a scalar floating poiint number) is finite", or such? Segher
Re: [PATCHv3] Optab: add isfinite_optab for __builtin_isfinite
On Tue, May 28, 2024 at 02:09:50PM +0200, Richard Biener wrote: > On Tue, May 28, 2024 at 9:09 AM Kewen.Lin wrote: > > As Haochen's previous reply, I think there are three cases: > > 1) no optab defined, fold in a generic way; > > 2) optab defined, SUCC, expand as what it defines; > > 3) optab defined, FAIL, generate a library call; > > > > From above, I had the concern that ports may assume FAILing can > > fall back with the generic folding, but it's not actually. > > Hmm, but it should. Can you make that work? That certainly would be the least surprising! Segher
Re: [PATCH v3 #1/2] [rs6000] adjust return_pc debug attrs
On Sat, May 25, 2024 at 09:13:12AM -0300, Alexandre Oliva wrote: > Some of the rs6000 call patterns, on some ABIs, issue multiple opcodes > out of a single call insn, but the call (bl) or jump (b) is not always > the last opcode in the sequence. > This does not seem to be a problem for exception handling tables, but > the return_pc attribute in the call graph output in dwarf2+ debug > information, that takes the address of a label output right after the > call, does not match the value of the link register even for non-tail > calls. E.g., with ABI_AIX or ABI_ELFv2, such code as: > > foo (); > > outputs: > > bl foo > nop > LVL#: > [...] > .8byte .LVL# # DW_AT_call_return_pc > > but debug info consumers may rely on the return_pc address, and draw > incorrect conclusions from its off-by-4 value. > > This patch uses the infrastructure for targets to add an offset to the > label issued after the call_insn to set the call_return_pc attribute, > on rs6000, to account for opcodes issued after actual call opcode as > part of call insns output patterns. > for gcc/ChangeLog > > * config/rs6000/rs6000.cc (TARGET_CALL_OFFSET_RETURN_LABEL): > Override. Please don't (incorrectly!) line-wrap changelogs. Lines are 80 characters wide, not 60 or 72 or whatever. 80. Indents are tabs that take 8 columns. > +/* Return the offset to be added to the label output after CALL_INSN > + to compute the address to be placed in DW_AT_call_return_pc. */ > + > +static int > +rs6000_call_offset_return_label (rtx_insn *call_insn) > +{ > + /* All rs6000 CALL_INSN output patterns start with a b or bl, always This isn't true. There is a crlogical insn before the bcl for sysv for example. > + a 4-byte instruction, but some output patterns issue other > + opcodes afterwards. The return label is issued after the entire > + call insn, including any such post-call opcodes. Instead of > + figuring out which cases need adjustments, we compute the offset > + back to the address of the call opcode proper, then add the > + constant 4 bytes, to get the address after that opcode. */ > + return 4 - get_attr_length (call_insn); Please explain this magic, too -- in code preferably (so with a ? : maybe, but don't try to "optimise" that expression, let the compiler do that, it is much better at it anyway :-) ) > +} Is that correct for all ABIs we support? Even if so, it needs a lot more documentation than this. Segher
Re: [PATCH v3 #1/2] enable adjustment of return_pc debug attrs
On Sat, May 25, 2024 at 09:12:05AM -0300, Alexandre Oliva wrote: You sent multiple patch series in one thread, and multiple versions of the same series even. This is very hard to even *read*, let alone work with. Please don't. Segher
Re: [PATCH] rs6000: Don't pass -many to the assembler [PR112868]
Hi! On Wed, May 22, 2024 at 09:29:13AM -0500, Peter Bergner wrote: > On 5/21/24 8:27 AM, jeevitha wrote: > > The following patch has been bootstrapped and regtested with default > > configuration > > [--enable-checking=yes] and with --enable-checking=release on > > powerpc64le-linux. > > > > This patch removes passing the -many assembler option for release builds. > > Now, > > GCC no longer passes -many under any conditions to the assembler. Why do we want that? I cannot read minds. > You are missing a ChangeLog entry for the target-supports.exp change plus > there is no mention of why it's needed in the git log entry. In the commit message you mean? Yeah. This info belongs in the commit message. Is the target-supports thing that Cell thing? That does not belong here at all. If it wasn't simply a mistake, it should be a separate commit, with a lot of explanation. Segher
Re: [PATCH-4, rs6000] Implement optab_isnormal for SFmode, DFmode and TFmode [PR97786]
On Fri, May 17, 2024 at 10:38:54AM +0800, HAO CHEN GUI wrote: > This expand calls gen_xststdcp which is a P9 vector instruction and > relies on "TARGET_P9_VECTOR". So I set the condition. Why? It needs P9, sure, and MSR[VSX] set, but the operands being VSX registers takes care of that, heh. But it's fine, the insn patterns it uses use the same conditions already. Segher
Re: [PATCH-4, rs6000] Implement optab_isnormal for SFmode, DFmode and TFmode [PR97786]
Hi! On Fri, Apr 12, 2024 at 04:24:23PM +0800, HAO CHEN GUI wrote: > This patch implemented optab_isnormal for SF/DF/TFmode by rs6000 test > data class instructions. > > This patch relies on former patch which adds optab_isnormal. > https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649366.html > gcc/ > PR target/97786 > * config/rs6000/vsx.md (isnormal2): New expand for SFmode and > DFmode. * config/rs6000/vsx.md (isnormal2 for SFDF): New expand. (isnormal2 for IEEE128): New expand. > --- a/gcc/config/rs6000/vsx.md > +++ b/gcc/config/rs6000/vsx.md > @@ -5357,6 +5357,30 @@ (define_expand "isfinite2" >DONE; > }) > > +(define_expand "isnormal2" > + [(use (match_operand:SI 0 "gpc_reg_operand")) > + (use (match_operand:SFDF 1 "gpc_reg_operand"))] > + "TARGET_HARD_FLOAT > + && TARGET_P9_VECTOR" Please put the condition on just one line if it is as simple and short as this. Why is TARGET_P9_VECTOR the correct condition? > +{ > + rtx tmp = can_create_pseudo_p () ? gen_reg_rtx (SImode) : operands[0]; This is an expander. can_create_pseudo_p always return true. Please simplify the code, keeping that in mind :-) > +(define_expand "isnormal2" > + [(use (match_operand:SI 0 "gpc_reg_operand")) > + (use (match_operand:IEEE128 1 "gpc_reg_operand"))] > + "TARGET_HARD_FLOAT > + && TARGET_P9_VECTOR" > +{ > + rtx tmp = can_create_pseudo_p () ? gen_reg_rtx (SImode) : operands[0]; > + emit_insn (gen_xststdcqp_ (tmp, operands[1], GEN_INT (0x7f))); > + emit_insn (gen_xorsi3 (operands[0], tmp, const1_rtx)); > + DONE; > +}) Same issues here, of course. > + Why add radom white lines? Pleaase don't. > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-7.c > @@ -0,0 +1,16 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target powerpc_vsx_ok } */ > +/* { dg-options "-O2 -mdejagnu-cpu=power9 -mvsx" } */ If you use a -mcpu=, don't use vsx_ok. If you use a -mcpu=, don't use -mvsx. > +int test1 (double x) > +{ > + return __builtin_isnormal (x); > +} > + > +int test2 (float x) > +{ > + return __builtin_isnormal (x); > +} > + > +/* { dg-final { scan-assembler-not {\mfcmpu\M} } } */ Just \mfcmp please (so that it also catches fcmpo, if we ever generate that). > +/* { dg-final { scan-assembler-times {\mxststdc[sd]p\M} 2 } } */ Maybe you should test for one each of the s and d version? So just /* { dg-final { scan-assembler-times {\mxststdcsp\M} 1 } } */ /* { dg-final { scan-assembler-times {\mxststdcdp\M} 1 } } */ > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr97786-8.c > @@ -0,0 +1,13 @@ > +/* { dg-do compile { target lp64 } } */ Why run this on 64-bit systems only? If there is a reason, document that here (but is there a reason?) > +/* { dg-require-effective-target ppc_float128_sw } */ > +/* { dg-require-effective-target powerpc_vsx_ok } */ > +/* { dg-options "-O2 -mdejagnu-cpu=power9 -mvsx -mabi=ieeelongdouble > -Wno-psabi" } */ Same comments here: If you have a -mcpu you do not want vsx_ok or -mvsx. Please fix these things and resend. Thanks! Segher
Re: [PATCH] report message for operator %a on unaddressible exp
Hi! On Thu, May 16, 2024 at 02:56:49PM +0800, Jiufu Guo wrote: > Jiufu Guo writes: > > Segher Boessenkool writes: > >> On Tue, May 14, 2024 at 05:53:56PM +0800, Jiufu Guo wrote: > >>> Thanks so much for your great review! > >>> Reference other messages, I'm wondering "invalid %%a value" may be > >>> acceptable, or "invalid %%a address expression in TOC" maybe better. > >> > >> "%%a requires a memory operand"? Maybe even print out the actual > >> operand given, too. > > > > Thanks! I updated the code using: > > "%%a requires a memory reference operand", since the actual operand > > is treated as the address. > > I suspect one thing here: if "%%a requires memory" is accurate vs. > "%%a requires a memory reference". > > Reference the words from doc: > https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Generic-Operand-Modifiers > a: Substitute a memory reference, with the actual operand treated as the > address. > > And for below code: > '("#%a0" : :"m"(x))' is not accepted. Yeah, it always confuses me. Sorry. The operand is the actual address. > While '("#%a0" : :"r"())' is ok. > > So, it may be more accurate that: "%%a" as requirement of address of > memory. That sounds good yes. Segher
Re: [PATCH] report message for operator %a on unaddressible exp
On Tue, May 14, 2024 at 05:53:56PM +0800, Jiufu Guo wrote: > Thanks so much for your great review! > Reference other messages, I'm wondering "invalid %%a value" may be > acceptable, or "invalid %%a address expression in TOC" maybe better. "%%a requires a memory operand"? Maybe even print out the actual operand given, too. Segher
Re: [PATCH] report message for operator %a on unaddressible exp
Oh, btw: On Tue, May 14, 2024 at 11:00:38AM +0800, Jiufu Guo wrote: > >> --- a/gcc/config/rs6000/rs6000.cc > >> +++ b/gcc/config/rs6000/rs6000.cc > >> @@ -14659,6 +14659,12 @@ print_operand_address (FILE *file, rtx x) > >>else if (SYMBOL_REF_P (x) || GET_CODE (x) == CONST > >> || GET_CODE (x) == LABEL_REF) > >> { > >> + if (this_is_asm_operands && !address_operand (x, VOIDmode)) > >> + { > >> +output_operand_lossage ("invalid expression as operand"); > >> +return; > >> + } That error message is not so good. Firstly, it typically *is* a valid expression here, just not a correct expression to have for an address. But, more generally and usefully, the error message should say *what* is wrong about the expression (namely, it is not an address). Most of the time you can use the same error message for asm and other expressions, and you get a great message in all contexts. operand_lossage already takes care of telling the user "you did something foolish" for inline asm, or "ICE" if it is a compiler problem instead. In error messages you do not often know what caused the problem, so just report on the facts you *do* know (and moreso with warnings, there you typically only know something looks unusual). Segher
Re: [PATCH] report message for operator %a on unaddressible exp
On Tue, May 14, 2024 at 11:00:38AM +0800, Jiufu Guo wrote: > >> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc > >> index 117999613d8..50943d76f79 100644 > >> --- a/gcc/config/rs6000/rs6000.cc > >> +++ b/gcc/config/rs6000/rs6000.cc > >> @@ -14659,6 +14659,12 @@ print_operand_address (FILE *file, rtx x) > >>else if (SYMBOL_REF_P (x) || GET_CODE (x) == CONST > >> || GET_CODE (x) == LABEL_REF) > >> { > >> + if (this_is_asm_operands && !address_operand (x, VOIDmode)) > > > > Do we really need this_is_asm_operands here? > I understand your point: > since in function 'print_operand_address' which supports not only user > asm code. So, it maybe incorrect if 'x' is not an 'address_operand', > no matter this_is_asm_operands. > > Here, 'this_is_asm_operands' is needed because it would be treated as an > user fault in asm-code (otherwise, internal_error in the compiler). You almost never want to test for asm, and just give the same error you would give in non-asm. It is the same problem after all, and giving the user the same error message is the most helpful thing to do! It can be useful to not say "ICE", but it already is prevented from doing that here. Segher
Re: [PATCH] report message for operator %a on unaddressible exp
Hi! On Mon, May 13, 2024 at 10:57:12AM +0800, Jiufu Guo wrote: > For PR96866, when gcc print asm code for modifier "%a" which requires > an address operand, It requires a *memory* operand, and it outputs its address. This is a generic modifier btw (not rs6000). > while the operand is with the constraint "X" which > allow non-address form. An error message would be reported to indicate > the invalid asm operands. "non-address form"? Every mem has an address. But 'X' is not memory. What is it at all? Why do we use that when you *have to* have mem here? The code you add that tests for address_operand looks wrong. I would expect it to test the operand is memory, instead :-) Segher
Re: [PATCH] rtlanal: Correct cost regularization in pattern_cost
On Fri, May 10, 2024 at 02:50:56PM +0200, Richard Biener wrote: > But for example a reg-reg move when optimizing for speed could have > a zero associated cost. Sure, but this is the way things always have been. I'm sure there are ways to change things so they become slightly easier to use, but that is not what we have now (or what we ever have had). > You might argue that's a bug since there's > an actual instruction and thus at least a size cost (and decode cost) > but then I've seen too much zero cost stuff in backends (like that > combine PR causing s390 backend making address-cost zero even > though it's just "same cost"). address_cost is something else entirely. Many backends have that set to 0 btw, and that means 0, not "unknown". It means all forms of address in valid machine instructions have the same execution (or size) costs. > IMO give we're dispatching to the rtx_cost hook eventually it needs > documenting there or alternatively catching zero and adjusting its > result there. Of course cost == 0 ? 1 : cost is wrong as it makes > zero vs. one the same cost - using cost + 1 when from rtx_cost > might be more correct, at least preserving relative costs. Most things should not use rtx_cost at all, only insn_cost. Taking the "cost" of any random RTL expression makes no sense at all. Neither conceptually, nor practically: it causes many problems, and solves none. Most things already use only insn_cost, if you have the appropriate hooks implemented in your backend (all one of them even). This is so much easier to use (you only need to recognise some big categories of instructions, for a typical target core, instead of eighty different RTX codes, and the combination of them), and gives way better results. Segher
Re: [PATCH] rtlanal: Correct cost regularization in pattern_cost
On Fri, May 10, 2024 at 12:19:35PM +0200, Richard Biener wrote: > On Fri, May 10, 2024 at 11:06 AM Segher Boessenkool > wrote: > > *All* code using a cost will have to be inspected and possibly adjusted > > if you decide to use a different value for "unknown" than what we have > > had for ages. All other cost functions interacting with this one, too. > > Btw, looking around pattern_cost is the only API documenting this special > value and the function after it using this function, insn_cost does the same > but > > int > insn_cost (rtx_insn *insn, bool speed) > { > if (targetm.insn_cost) > return targetm.insn_cost (insn, speed); > > and the target hook doesn't document this special value. set_src_cost > doesn't either, btw (that just uses rtx_cost). So I don't think how > pattern_cost handles the set_src_cost result is warranted. There's > simply no way to detect whether set_src_cost returns an actual > value - on the contrary, it always does. I introduced insn_cost. I didn't think about documenting that 0 means unknown, precisely because that is so pervasive! Segher
Re: [PATCH] rtlanal: Correct cost regularization in pattern_cost
On Fri, May 10, 2024 at 04:50:10PM +0800, HAO CHEN GUI wrote: > Hi Richard, > Thanks for your comments. > > 在 2024/5/10 15:16, Richard Biener 写道: > > But if targets return sth < COSTS_N_INSNS (1) but > 0 this is now no > > longer meaningful. So shouldn't it instead be > > > > return cost > 0 ? cost : 1; > Yes, it's better. > > > > > ? Alternatively returning fractions of COSTS_N_INSNS (1) from set_src_cost > > is invalid and thus the target is at fault (I do think that making zero the > > unknown value is quite bad since that makes it impossible to have zero > > as cost represented). > > > > It seems the check is to aovid pattern_cost return zero (unknown), so the > > comment holds to pattern_cost the same (it returns an 'int' so the better > > exceptional value would have been -1, avoiding the compare). > But sometime it adds an insn cost. If the unknown cost is -1, the total cost > might be distorted. *All* code using a cost will have to be inspected and possibly adjusted if you decide to use a different value for "unknown" than what we have had for ages. All other cost functions interacting with this one, too. Segher
Re: [PATCH] rtlanal: Correct cost regularization in pattern_cost
Hi! On Fri, May 10, 2024 at 09:16:26AM +0200, Richard Biener wrote: > On Fri, May 10, 2024 at 4:25 AM HAO CHEN GUI wrote: > >But if set_src_cost returns a value less than COSTS_N_INSNS (1), it's > > untouched and just returned by pattern_cost. Thus "zero" from set_src_cost > > is higher than "one" from set_src_cost. > > > > For instance, i386 returns cost "one" for zero_extend op. > > //ix86_rtx_costs > > case ZERO_EXTEND: > > /* The zero extensions is often completely free on x86_64, so make > > it as cheap as possible. */ > > if (TARGET_64BIT && mode == DImode > > && GET_MODE (XEXP (x, 0)) == SImode) > > *total = 1; > > > > This patch fixes the problem by converting all costs which are less than > > COSTS_N_INSNS (1) to COSTS_N_INSNS (1). > > > > Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no > > regressions. Is it OK for the trunk? > > But if targets return sth < COSTS_N_INSNS (1) but > 0 this is now no > longer meaningful. So shouldn't it instead be > > return cost > 0 ? cost : 1; I don't think either is very good. Why does it want to convert "unknown cost" to some known cost at all? Fixing *that* would really improve things! > ? Alternatively returning fractions of COSTS_N_INSNS (1) from set_src_cost > is invalid and thus the target is at fault (I do think that making zero the > unknown value is quite bad since that makes it impossible to have zero > as cost represented). No, we have COST_N_INSNS exactly because fractions of integer costs are useful to have. COST_N_INSNS (1) really stands for "the cost of a cheap integer isns with latency 1", and many targets want to express some insns are better than others, even if they have the same latency. Perhaps some insns use less of some resource, etc. > It seems the check is to aovid pattern_cost return zero (unknown), so the > comment holds to pattern_cost the same (it returns an 'int' so the better > exceptional value would have been -1, avoiding the compare). Cost 0 for unkown is used in (almost) *all* cost things in RTL. Pretty much everything can deal with it just fine. What is different here? The abstraction "pattern_cost" is a lousy abstraction of course, but where is this used? Cost "unknown" can be passed through everywhere, in principle. Segher
Re: [PATCH 3/3] combine: initialize a local var
On Thu, May 02, 2024 at 02:38:12PM -0600, Jeff Law wrote: > > > On 5/2/24 12:59 PM, Vineet Gupta wrote: > >This is no logic change (but technically still a functional change). > > > >Ran into this when stepping thru combine code. > >@newpat has some random garbage for a bit until it is actually set. > >With the fix it remains 0 until actually set. > > > >gcc/ChangeLog: > > * combine.cc (try_combine): Initialize newpat. > Isn't the same true even after this change if you turn on the optimizer? > And isn't the same true for many other objects that are initialized > lazily? For all, even. Without this change the compiler can (in theory, anyway) diagnose the uninitialised use. After, there *is* no uninitialised use anymore. Please don't do this. It is not an improvement, it is several steps back, to satisfy a misguides sense of "security". Segher
Re: [PATCH 3/3] combine: initialize a local var
On Thu, May 02, 2024 at 11:59:24AM -0700, Vineet Gupta wrote: > This is no logic change (but technically still a functional change). Where are 1/3 and 2/3? Or are those unrelated? Please don't make series like that. > Ran into this when stepping thru combine code. > @newpat has some random garbage for a bit until it is actually set. > With the fix it remains 0 until actually set. The same is true for all uninitialised variables. Setting everything to zero explicitly is a) quite a bit slower, and b) just as wrong! For example, here, newpat should never be zero. Never. It does not make any sense. Is there any place where newpat is used uninitialised? Segher
Re: [PATCH, rs6000] Use bcdsub. instead of bcdadd. for bcd invalid number checking
On Thu, Apr 18, 2024 at 11:14:42AM +0800, Kewen.Lin wrote: > on 2024/4/18 10:01, HAO CHEN GUI wrote: > > This patch replace bcdadd. with bcdsub. for bcd invalid number checking. > > bcdadd on two same numbers might cause overflow which also set > > overflow/invalid bit so that we can't distinguish it's invalid or overflow. > > The bcdsub doesn't have the problem as subtracting on two same number never > > causes overflow. > > > > Bootstrapped and tested on powerpc64-linux BE and LE with no > > regressions. Is it OK for the trunk? > > Considering that this issue affects some basic functionality of bcd bifs > and the fix itself is simple and very safe, OK for trunk, thanks for fixing! Yup. If a number X is invalid the X-X calculation might set the overflow flag as well, but we cannot see that difference at all anyway, it always has set the invalid flag after all. Thanks! Segher
Re: [PATCH] rs6000: Add OPTION_MASK_POWER8 [PR101865]
Hi! On Thu, Apr 11, 2024 at 11:23:02PM -0500, Peter Bergner wrote: > On 4/11/24 10:31 PM, Kewen.Lin wrote: > >> +;; This option exists only to create its MASK. It is not intended for > >> users. > >> +mdo-not-use-this-option > >> +Target RejectNegative Mask(POWER8) Var(rs6000_isa_flags) WarnRemoved > >> + > > > > I can understand the given name is to avoid users to use it, but it looks > > odd, personally > > I'm inclined to mpower8 (or even mpower8-internal) even if it's more likely > > to be used but > > it's a bit more meaningful (especially we already have mpower10), > > theoretically speaking > > it's undocumented users shouldn't use it at all. > > Sorry, I should have mentioned this, but I originally had it -mpower8, but > given > it was an option we don't want users to use, Segher mentioned offline to give > it > a name something like the above and not -mpower8. I kind of like > -mpower8-internal > now that you mention it, but I'd like Segher's input here whether he prefers > -mdo-not-use-this-option or -mpower8-internal or something else??? -mpower8-internal is fine. Anyone who thinks this would be a good thing to us, well, we cannot stop them from hurting themselves I guess. Esp. with a nice help text it is fine. Going forward we need something like this for most ISA levels (we currently often use the existence of some more-or-less random insn for this, but we need the same test to actually test for *that* insn, not a good thing at all). But we do not want the user to be able to use such options at all, so we really shouln't make command line options for it. So it should not use an option flag at all for this, but something else. Maybe something new even, we have had problems around this forever, that suggests we have insufficient abstractions around this :-) > I'll make the changes above, modulo leaving the option name unchanged until > we hear from Segher on that and report back on the LE and BE testing. -mpower8-internal should dissuade users from using it, certainly people who actually read the documentation as well. It is unfortunate we need to tell people to not use tools we provide ourselves, but this is temporary, right :-) (Right?!) Thanks guys, Segher
Re: Combine patch ping
On Wed, Apr 10, 2024 at 08:32:39PM +0200, Uros Bizjak wrote: > On Wed, Apr 10, 2024 at 7:56 PM Segher Boessenkool > wrote: > > This is never okay. You cannot commit a patch without approval, *ever*. This is the biggest issue, to start with. It is fundamental. > > That patch is also obvious -- obviously *wrong*, that is. There are > > big assumptions everywhere in the compiler how a CC reg can be used. > > This violates that, as explained elsewhere. > > Can you please elaborate what is wrong with this concrete patch. The explanation of the patch is contradictory to how RTL works at all, so it is just wrong. It might even do something sane, but I didn't get that far at all! Write good email explanations, and a good proposed commit message. Please. It is the only one people can judge a patch. Well, apart from doing everything myself from first principles, ignoring everything you said, just looking at the patch itself, but that is a hundred times more work. I don't do that. > The > part that the patch touches has several wrong assumptions, and the > fixed "???" comment just emphasizes that. I don't see what is wrong > with: > > (define_insn "@pushfl2" > [(set (match_operand:W 0 "push_operand" "=<") > (unspec:W [(match_operand 1 "flags_reg_operand")] > UNSPEC_PUSHFL))] > "GET_MODE_CLASS (GET_MODE (operands[1])) == MODE_CC" > "pushf{}" > [(set_attr "type" "push") >(set_attr "mode" "")]) What does it even mean? What is a flags:CC? You always always always need to say what is *in* the flags, if you want to use it as input (which is what unspec does). CC is weird like this. Most targets do not have distinct physical flags for every condition, only a few conditions are "alive" at any point in the program! > it is just a push of the flags reg to the stack. If the push can't be > described in this way, then it is the middle end at fault, we can't > just change modes at will. But that is not what this describes: it operates on the flags register in some unspecified way, and pushes the result of *that* to the stack. (Stack pointer modification is not described here btw, should it be? Is that magically implemented by the backend some way, via type=push perhaps?) Segher
Re: Combine patch ping
On Sun, Apr 07, 2024 at 08:31:38AM +0200, Uros Bizjak wrote: > If there are no further comments, I plan to commit the referred patch > to the mainline on Wednesday. The latest version can be considered an > obvious patch that solves certain oversight in the original > implementation. This is never okay. You cannot commit a patch without approval, *ever*. That patch is also obvious -- obviously *wrong*, that is. There are big assumptions everywhere in the compiler how a CC reg can be used. This violates that, as explained elsewhere. Segher
Re: [COMMITTED] testsuite/gcc.target/cris/pr93372-2.c: Handle xpass from combine improvement
Hi! On Fri, Apr 05, 2024 at 04:06:01AM +0200, Hans-Peter Nilsson wrote: > The xpassing change in generated code was as follows, at > r14-9788-gb7bd2ec73d66f7 (where I locally applied a revert > to verify that this suspect was the cause). That was so > much of an improvement that I had to share it! Worth the > testsuite churn anyway. :) > > Segher, if you end up reverting r14-9692-g839bc42772ba7a (as > unfortunately seems not unlikely), then please also revert this > commit: r14-9799-g4c8b3600c4856f7915281ae3ff4d97271c83a540. I won't revert it, it fixes an actual bug. Not a regression no, but a very serious longstanding problem. We have accidentally done a limited version of a feature requested for more than 20 years now, "UNCSE". I'll do this for just combine (instead of as a separate pass, lots of issues there with pass ordering, results could be better though) in GCC 15. This really is a stage 1 thing though! Any testcase that relies on something that combine does not promise and that can not reasonably be expected to always hold is *buggy*. The combine-2-2 testcase (that I wrote myself) isn't very good, and should be replaced by something that is much more clearly a 2->2 combination, instead of 1->1 with context. All (target-specific) new testsuite failures are just like that: bad testcases! So no, no reversion. > That's the only test that's improved to the point of > affecting test-patterns. E.g. pr93372-5.c (which references > pr93372-2.c) is also improved, though it retains a redundant > compare insn. (PR 93372 was about regressions from the cc0 > representation; not further improvement like here, thus it's > not tagged. Though, I did not double-check whether this > actually *was* a regression from cc0.) Interesting that this improved tests for you. Huh. Do you have an explanation how this happened? I suspect that as uaual it is just a side effect of random factors: combine is opportunistic, always does the first change it thinks good, not considering what this then does for other possible combinations; it is greedy. It would be nice to see written out what happens in this example though :-) Segher
Re: [PATCH] rtl-optimization/101523 - avoid re-combine after noop 2->2 combination
Hi! On Wed, Apr 03, 2024 at 01:07:41PM +0200, Richard Biener wrote: > The following avoids re-walking and re-combining the instructions > between i2 and i3 when the pattern of i2 doesn't change. > > Bootstrap and regtest running ontop of a reversal of > r14-9692-g839bc42772ba7a. Please include that in the patch (or series, preferably). > It brings down memory use frmo 9GB to 400MB and compile-time from > 80s to 3.5s. r14-9692-g839bc42772ba7a does better in both metrics > but has shown code generation regressions across acrchitectures. > > OK to revert r14-9692-g839bc42772ba7a? No. The patch solved a very real problem. How does your replacement handle that? You don't say. It looks like it only battles symptoms a bit, instead :-( We had this before: 3->2 combinations that leave an instruction identical to what was there before. This was just a combination with context as well. The only reason this wasn't a huge problem then already was because this is a 3->2 combination, even if it really is a 2->1 one it still is beneficial in all the same cases. But in the new case it can iterate indefinitely -- well not quite, but some polynomial number of times, for a polynomial at least of degree three, possibly more :-( With this patch you need to show combine still is linear. I don't think it is, but some deeper analysis might show it still is. ~ - ~ - ~ What should *really* be done is something that has been on the wish list for decades: an uncse pass. The things that combine no longer works on after my patch are actually 1->1 combinations (which we never do currently, although we probably should); or alternatively, an un-CSE followed by a 2->1 combination. We can do the latter of course, but we need to do an actual uncse first! Somewhere before combine, and then redo a CSE after it. An actual CSE, not doing ten gazillion other things. Segher
[PATCH] combine: Don't combine if I2 does not change
In some cases combine will "combine" an I2 and I3, but end up putting exactly the same thing back as I2 as was there before. This is never progress, so we shouldn't do it, it will lead to oscillating behaviour and the like. If we want to canonicalise things, that's fine, but this is not the way to do it. Committed to trunk. Segher 2024-03-27 Segher Boessenkool PR rtl-optimization/101523 * combine.cc (try_combine): Don't do a 2-insn combination if it does not in fact change I2. --- gcc/combine.cc | 11 +++ 1 file changed, 11 insertions(+) diff --git a/gcc/combine.cc b/gcc/combine.cc index a4479f8d8364..745391016d04 100644 --- a/gcc/combine.cc +++ b/gcc/combine.cc @@ -4186,6 +4186,17 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, rtx_insn *i0, adjust_for_new_dest (i3); } + /* If I2 didn't change, this is not a combination (but a simplification or + canonicalisation with context), which should not be done here. Doing + it here explodes the algorithm. Don't. */ + if (rtx_equal_p (newi2pat, PATTERN (i2))) +{ + if (dump_file) + fprintf (dump_file, "i2 didn't change, not doing this\n"); + undo_all (); + return 0; +} + /* We now know that we can do this combination. Merge the insns and update the status of registers and LOG_LINKS. */ -- 1.8.3.1
Re: [PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]
On Thu, Mar 07, 2024 at 11:46:54PM +0100, Uros Bizjak wrote: > > Can't you just describe the dataflow then, without an unspec? An unspec > > by definition does some (unspecified) operation on the data. > > Previously, it was defined as: > > (define_insn "*pushfl2" >[(set (match_operand:W 0 "push_operand" "=<") > (match_operand:W 1 "flags_reg_operand"))] > > But Wmode, AKA SI/DImode is not CCmode. And as said in my last > message, nothing prevents current source code to try to update the CC > register here. So you can use an unspec just to convert the flags reg to an integer? To make an integer representation of flags reg contents. Or is that what we started with here? Segher
Re: [PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]
On Thu, Mar 07, 2024 at 11:27:28PM +0100, Uros Bizjak wrote: > On Thu, Mar 7, 2024 at 11:07 PM Uros Bizjak wrote: > > > > (unspec:DI [ > > > > (reg:CC 17 flags) > > > > ] UNSPEC_PUSHFL) > > > > > > But that is invalid RTL? The only valid use of a CC is written as > > > cc-compared-to-0. It means "the result of something compared to 0, > > > stored in that cc reg". > > > > > > (And you can copy a CC reg around, but that is not a use ;-) ) > > Hm... under this premise, we can also say that any form that is not > cc-compared-to-0 is not a use. Well, no. All uses of CC are written as comparisons to 0, or are pure dataflow. Anything else is not "not a use" but just invalid. > Consequently, if it is not a use, then > the CC reg should not be updated at its use location, so my v1 patch, > where we simply skip the update (but retain the combined insn), > actually makes sense. With more asserts, perhaps. > In this concrete situation, we don't care about CC register mode in > the PUSHFL insn. And we should not change CC reg mode of the use, > because any other mode than the generic CCmode won't be recognized by > the insn pattern. You always care about the CC mode, that is why you always write it as comparison, so the backend can choose a mode based on what the flag bits mean in this context. For an extreme example look at 390, but on pretty much any target both signed and unsigned comparisons use the same flag bits, and maybe fp comparisons as well. But pushfl does sound like pure dataflow. Why is this a builtin, is it ever a good idea if the user writes stuff the compiler can do a better job doing itself, not to mention it is way easier for the compiler anyway? :-) Segher
Re: [PATCH] rs6000: Fix issue in specifying PTImode as an attribute [PR106895]
Hi! On Fri, Feb 23, 2024 at 03:04:13PM +0530, jeevitha wrote: > PTImode attribute assists in generating even/odd register pairs on 128 bits. It is a mode, not an attribute. Attributes are on declarations, while modes are on a much more fundamental level (every value has a mode, in GCC!) > When the user specifies PTImode as an attribute, it breaks because there is no > internal type to handle this mode . We have created a tree node with dummy > type > to handle PTImode. You are talking about the mode attribute. Aha. This attribute says the mode of the datum is something specific; it is the only way a user can specify the mode directly. Not something you want to use normally, but it's a nice escape hatch (like here). > We are not documenting this dummy type since users are not > allowed to use this type externally. Not sure what this is meant to mean? What does "allowed to" mean, even? We do not forbid people from doing anything (we can discourage them though). Or dso you mean something just doesn't work? > gcc/ > PR target/106895 > * config/rs6000/rs6000.h (enum rs6000_builtin_type_index): Add fields > to hold PTImode type. An enum does not have fields. What do you actually mean? > --- a/gcc/config/rs6000/rs6000.h > +++ b/gcc/config/rs6000/rs6000.h > @@ -2304,6 +2304,7 @@ enum rs6000_builtin_type_index >RS6000_BTI_ptr_vector_quad, >RS6000_BTI_ptr_long_long, >RS6000_BTI_ptr_long_long_unsigned, > + RS6000_BTI_PTI, Please call it RS6000_BTI_INTPTI, to be more in line with the naming of other things. With that fixed it should be good. Please repost with a good commit comment and changelog :-) Thanks, Segher
Re: [PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]
On Thu, Mar 07, 2024 at 11:07:18PM +0100, Uros Bizjak wrote: > On Thu, Mar 7, 2024 at 10:37 PM Segher Boessenkool > wrote: > > > but can be something else, such as the above noted > > > > > > (unspec:DI [ > > > (reg:CC 17 flags) > > > ] UNSPEC_PUSHFL) > > > > But that is invalid RTL? The only valid use of a CC is written as > > cc-compared-to-0. It means "the result of something compared to 0, > > stored in that cc reg". > > > > (And you can copy a CC reg around, but that is not a use ;-) ) > > How can we describe a pushfl then? (unspec:DI [ (compare:CC) ((reg:CC 17 flags) (const_int 0)) ] UNSPEC_PUSHFL) or something like that? > It was changed to its current form > in [1], but I suspect that the code will try to update it even when > pushfl is implemented as a direct move from a register (as was defined > previously). > > BTW: Unspecs are used in a couple of other places for other targets [2]. > > [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112494#c5 > [2] https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639743.html There is nothing wront with unspecs. You cannot use a CCmode value without comparing it to 0, though. The exact kind of comparison determines what bits are valid (and have what meaning) in your CC reg, even! > > > The source code that deals with the *user* of the CC register assumes > > > the former form, so it blindly tries to update the mode of the CC > > > register inside LT comparison RTX > > > > Would you like it better if there was an assert for this? There are > > very many RTL requirements that aren't chacked for currently :-/ > > In this case - yes. Assert signals that something is unsupported (or > invalid), way better than silently corrupting some memory, reporting > the corruption only with checking enabled. Yeah. The way RTL checking works makes this hard to do in most cases. Hrm. (It cannot easily look at context, only inside of the current RTX). > > The unspec should have the CC compared with 0 as argument. > > But this does not do what pushfl does... It pushes the register to the stack. Can't you just describe the dataflow then, without an unspec? An unspec by definition does some (unspecified) operation on the data. Segher
Re: [PATCH V3 2/2] rs6000: Load store fusion for rs6000 target using common infrastructure
On Fri, Mar 08, 2024 at 03:01:04AM +0530, Ajit Agarwal wrote: > > >> + Copyright (C) 2020-2023 Free Software Foundation, Inc. > > > > What in here is from 2020? > > > > Most things will be from 2024, too. First publication date is what > > counts. > > Please let me know the second publication date. Huh? This code won't be published before 2024 (and it does not derive from older code), so the only valid date in the copyright message is 2024. > >> + bool pair_check_register_operand (bool load_p, rtx reg_op, > >> + machine_mode mem_mode) > >> + { > >> +if (load_p || reg_op || mem_mode) > >> + return false; > >> +else > >> + return false; > >> + } > > > > The compiler will have warned for this. Please look at all compiler > > (and other) warnings that you introduce. > > > > As far as my understanding I didn't see any extra warnings, > but I will surely cross check and solve that. Hrm, apparently there is no -Wall -W warnign for this? But your code is essentially bool pair_check_register_operand (bool, rtx, machine_mode) { return false; } > >> +// alias_walker that iterates over stores. > >> +template > >> +class store_walker : public def_walker > > > > That is not a good comment. You should describe parameters and return > > values and that kind of thing. That it walks over things is bloody > > obvious from the name already :-) > > > > This part of code is taken from aarch64 load store fusion > pass. I have made the aarch64-ldp-fusion.cc into target independent code and > target dependent code. Target independent code is shared > across all the architecture, In this case its rs6000 and aarch64. > Target dependent code is implemented through pure virtual functions. It is required to decribe what a function is for, and all its arguments and return values. If the aarch64 code doesn't, it should be fixed. Not only reviewers need this, anyone trying to use the code needs it, too. > >> +find_trailing_add (insn_info *insns[2], > >> + const insn_range_info _range, > >> + int initial_writeback, > >> + rtx *writeback_effect, > >> + def_info **add_def, > >> + def_info *base_def, > >> + poly_int64 initial_offset, > >> + unsigned access_size); > > > > That is way, way, way too many parameters. > > > > This code I have taken from aarch64-ldp-fusion.cc. > I have not changed anything here. Don't copy not-so-good stuff unmodified? It is unreviewable, to start with, but probably not very usable later either. > Could you please elaborate on how you want me > to structure the patches. *You* should know the code already, so you surely can figure out a nice way to present it, so that it takes me LESS work to review this than it took you to write it? Making a patch series reviewable is part of the development effort. It is way less work if you start with this as the goal in mind. It is less work than writing (and debugging etc.) an omnibus patch, in the first place! Your goal is to make a patch series that will be reviewed and then seen to be great stuff. So it of course needs to *be* great stuff, but it also needs to be presented in such a way that that is obvious. Segher
Re: [PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]
On Thu, Mar 07, 2024 at 10:04:32PM +0100, Uros Bizjak wrote: [snip] > The part we want to fix deals with the *user* of the CC register. It > is not true that this is always COMPARISON_P, so EQ, NE, GE, LT, ... > in the form of > > (LT:CCGC (reg:CCGC 17 flags) (const_int 0)) > > but can be something else, such as the above noted > > (unspec:DI [ > (reg:CC 17 flags) > ] UNSPEC_PUSHFL) But that is invalid RTL? The only valid use of a CC is written as cc-compared-to-0. It means "the result of something compared to 0, stored in that cc reg". (And you can copy a CC reg around, but that is not a use ;-) ) > The source code that deals with the *user* of the CC register assumes > the former form, so it blindly tries to update the mode of the CC > register inside LT comparison RTX Would you like it better if there was an assert for this? There are very many RTL requirements that aren't chacked for currently :-/ > (some other nearby source code even > checks for (const_int 0) RTX). Obviously, this is not the case with > the former form, where the update tries to: > > SUBST (XEXP (*cc_use_loc, 0), ...) > > on unspec, which has no XEXP (..., 0). > > And *this* is what triggers RTX checking assert. The unspec should have the CC compared with 0 as argument. Segher
Re: [PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]
On Thu, Mar 07, 2024 at 12:22:04PM +0100, Uros Bizjak wrote: > As I understood find_single_use, it is returning RTX iff DEST is used > only a single time in an insn sequence following INSN. Connected by a log_link even, yeah. > We can reject the combination without worries of multiple uses. Yup. That is the whole point of find_single_use: if that test fails, combine knows to get its paws off :-) Segher
Re: [PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]
On Thu, Mar 07, 2024 at 11:45:45AM +0100, Richard Biener wrote: > The question is, whether a NULL cc_use_loc (find_single_use returning > NULL) means "there is no use" or it can mean "huh, don't know, maybe > more than one, maybe I was too stupid to indentify the single use". > The implementation suggests it's all broken ;) It specifically means there is not a *single* use (or it could not find what it was, perhaps). It does not mean there is no use. There is not much in combine that cares about dead code anyway, earier passes should have taken care of that ;-) All as documented. Segher
Re: [PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]
On Thu, Mar 07, 2024 at 11:22:12AM +0100, Richard Biener wrote: > > > > Undo the combination if *cc_use_loc is not COMPARISON_P. Why, anyway? COMPARISON_P means things like LE. It does not even include actual RTX COMPARE. Segher
Re: [PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]
On Thu, Mar 07, 2024 at 10:55:12AM +0100, Richard Biener wrote: > On Thu, 7 Mar 2024, Uros Bizjak wrote: > > This is > > > > 3236 /* Just replace the CC reg with a new mode. */ > > 3237 SUBST (XEXP (*cc_use_loc, 0), newpat_dest); > > 3238 undobuf.other_insn = cc_use_insn; > > > > in combine.cc, where *cc_use_loc is > > > > (unspec:DI [ > > (reg:CC 17 flags) > > ] UNSPEC_PUSHFL) > > > > combine assumes CC must be used inside of a comparison and uses XEXP (..., > > 0) No. It has established *this is the case* some time earlier. Lines\ 3155 and on, what begins with /* Many machines have insns that can both perform an arithmetic operation and set the condition code. > > OK for trunk? > > Since you CCed me - looking at the code I wonder why we fatally fail. I did not get this email btw. Some blip in email (on the sender's side) I guess? > The following might also fix the issue and preserve more of the > rest of the flow of the function. > --- a/gcc/combine.cc > +++ b/gcc/combine.cc > @@ -3182,7 +3182,8 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn > *i1, rtx_insn *i0, > >if (undobuf.other_insn == 0 > && (cc_use_loc = find_single_use (SET_DEST (newpat), i3, > - _use_insn))) > + _use_insn)) > + && COMPARISON_P (*cc_use_loc)) Line 3167 already is && GET_CODE (SET_SRC (PATTERN (i3))) == COMPARE so what in your backend is unusual? Segher
Re: [PATCH V3 2/2] rs6000: Load store fusion for rs6000 target using common infrastructure
Hi! On Mon, Feb 19, 2024 at 04:24:37PM +0530, Ajit Agarwal wrote: > --- a/gcc/config.gcc > +++ b/gcc/config.gcc > @@ -518,7 +518,7 @@ or1k*-*-*) > ;; > powerpc*-*-*) > cpu_type=rs6000 > - extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o" > + extra_objs="rs6000-string.o rs6000-p8swap.o rs6000-logue.o > rs6000-vecload-fusion.o" Line too long. > + /* Pass to replace adjacent memory addresses lxv instruction with lxvp > + instruction. */ > + INSERT_PASS_BEFORE (pass_early_remat, 1, pass_analyze_vecload); That is not such a great name. Any pss name with "analyze" is not so good -- the pass does much more than just "analyze" things! > --- /dev/null > +++ b/gcc/config/rs6000/rs6000-vecload-fusion.cc > @@ -0,0 +1,701 @@ > +/* Subroutines used to replace lxv with lxvp > + for TARGET_POWER10 and TARGET_VSX, The pass filename is not good then, either. > + Copyright (C) 2020-2023 Free Software Foundation, Inc. What in here is from 2020? Most things will be from 2024, too. First publication date is what counts. > + Contributed by Ajit Kumar Agarwal . We don't say such things in the files normally. > +class rs6000_pair_fusion : public pair_fusion > +{ > +public: > + rs6000_pair_fusion (bb_info *bb) : pair_fusion (bb) {reg_ops = NULL;}; > + bool is_fpsimd_op_p (rtx reg_op, machine_mode mem_mode, bool load_p); > + bool pair_mem_ok_policy (rtx first_mem, bool load_p, machine_mode mode) > + { > +return !(first_mem || load_p || mode); > + } It is much more natural to write this as retuurn !first_mem && !load && !mode; (_p is wrong, this is not a predicate, it is not a function at all!) What is "!mode" for here? How can VOIDmode happen here? What does it mean? This needs to be documented. > + bool pair_check_register_operand (bool load_p, rtx reg_op, > + machine_mode mem_mode) > + { > +if (load_p || reg_op || mem_mode) > + return false; > +else > + return false; > + } The compiler will have warned for this. Please look at all compiler (and other) warnings that you introduce. > +rs6000_pair_fusion::is_fpsimd_op_p (rtx reg_op, machine_mode mem_mode, bool > load_p) > +{ > + return !((reg_op && mem_mode) || load_p); > +} For more complex logic, split it up into two or more conditional returns. > +// alias_walker that iterates over stores. > +template > +class store_walker : public def_walker That is not a good comment. You should describe parameters and return values and that kind of thing. That it walks over things is bloody obvious from the name already :-) > +extern insn_info * > +find_trailing_add (insn_info *insns[2], > +const insn_range_info _range, > +int initial_writeback, > +rtx *writeback_effect, > +def_info **add_def, > +def_info *base_def, > +poly_int64 initial_offset, > +unsigned access_size); That is way, way, way too many parameters. So: * Better names please. * Better documentation, too, including documentations in the code. Don't describe *what*, anyone can see that anyway, but describe *why*. * This is way too much for one patch. Split this into many patches, properly structured in a patch series. The design will need some explanation, but none of the code should need that, ever! Segher
Re: [PATCH V2] rs6000: Don't allow immediate value in the vsx_splat pattern [PR113950]
On Wed, Feb 28, 2024 at 11:58:15AM -0600, Peter Bergner wrote: > On 2/28/24 8:31 AM, Segher Boessenkool wrote: > > On Tue, Feb 27, 2024 at 04:50:02PM -0600, Peter Bergner wrote: > >> So it seems you're not NAKing the use of splat_input_operand, but > >> just that it needs more explanation in the git log entry, correct? > > > > I NAK the patch. _Of course_ there needs to be *something* done, there > > is a bug after all, it needs to be fixed. > > > > But no, there are big questions about if splat_input_operand is correct > > as well. This needs to be justified in the patch submission. > > Ok, then Jeevitha, repost the patch with the s/op1/operands[1]/ only change. > Jeevitha has already bootstrapped and regtested that change and it does > fix the bug. > > Clearly, the splat_input_operand change needs more discussion and would > be a follow-on patch...if we decide to do it at all. It is clear that input_operand is wrong. It isn't clear that splat_input_operand is correct though :-( Segher
Re: [PATCH V2] rs6000: Don't allow immediate value in the vsx_splat pattern [PR113950]
On Tue, Feb 27, 2024 at 04:50:02PM -0600, Peter Bergner wrote: > On 2/27/24 6:40 AM, Segher Boessenkool wrote: > > On Tue, Feb 27, 2024 at 02:02:38AM +0530, jeevitha wrote: > > input_operand allows a lot of things that splat_input_operand does not, > > not just immediate operands. NAK. > > > > (For example, *all* memory is okay for input_operand, always). > > > > I'm not saying we do not want to restrict these things, but a commit > > that doesn't discuss this at all is not okay. Sorry. > > So it seems you're not NAKing the use of splat_input_operand, but > just that it needs more explanation in the git log entry, correct? I NAK the patch. _Of course_ there needs to be *something* done, there is a bug after all, it needs to be fixed. But no, there are big questions about if splat_input_operand is correct as well. This needs to be justified in the patch submission. > Yes, input_operand accepts a lot more things than splat_input_operand > does, but the multiple define_insns this define_expand feeds, uses > gpc_reg_operand, memory_operand and splat_input_operand for their > operands[1] operand (splat_input_operand accepts reg and mem too), > so it seems to match better what the patterns will be accepting and > I always thought that using predicates that more accurately reflect > what the define_insns expect/accept lead to better code gen. Still, it needs explanation why we allowed it before, but that was a mistake, or for some reason we do not need it. Sell your patch! :-) > Mike, was it just an oversight to not use splat_input_operand for the > vsx_splat_ expander or was input_operand a conscious decision? > > If input_operand was used purposely, then we can just fall back to > the s/op1/operands[1]/ change which we already know fixes the bug. input_operand allows all inputs for mov insns. It isn't suitable for any other instructions. Segher
Re: [PATCH V2] rs6000: Don't allow immediate value in the vsx_splat pattern [PR113950]
Hi! On Tue, Feb 27, 2024 at 02:02:38AM +0530, jeevitha wrote: > There is no immediate value splatting instruction in Power. Currently, those > values need to be stored in a register or memory. To address this issue, I > have updated the predicate for the second operand in vsx_splat to > splat_input_operand and corrected the assignment of op1 to operands[1]. > These changes ensure that operand1 is stored in a register. input_operand allows a lot of things that splat_input_operand does not, not just immediate operands. NAK. (For example, *all* memory is okay for input_operand, always). I'm not saying we do not want to restrict these things, but a commit that doesn't discuss this at all is not okay. Sorry. Segher
Re: Repost [PATCH 1/6] Add -mcpu=future
On Tue, Feb 20, 2024 at 06:35:34PM +0800, Kewen.Lin wrote: > on 2024/2/8 03:58, Michael Meissner wrote: > $ grep -r "define PROCESSOR_DEFAULT" gcc/config/rs6000/ > gcc/config/rs6000/aix71.h:#define PROCESSOR_DEFAULT PROCESSOR_POWER7 > gcc/config/rs6000/aix71.h:#define PROCESSOR_DEFAULT64 PROCESSOR_POWER7 > gcc/config/rs6000/aix72.h:#define PROCESSOR_DEFAULT PROCESSOR_POWER7 > gcc/config/rs6000/aix72.h:#define PROCESSOR_DEFAULT64 PROCESSOR_POWER7 > gcc/config/rs6000/aix73.h:#define PROCESSOR_DEFAULT PROCESSOR_POWER8 > gcc/config/rs6000/aix73.h:#define PROCESSOR_DEFAULT64 PROCESSOR_POWER8 > gcc/config/rs6000/darwin.h:#define PROCESSOR_DEFAULT PROCESSOR_PPC7400 > gcc/config/rs6000/darwin.h:#define PROCESSOR_DEFAULT64 PROCESSOR_POWER4 > gcc/config/rs6000/freebsd64.h:#define PROCESSOR_DEFAULT PROCESSOR_PPC7450 > gcc/config/rs6000/freebsd64.h:#define PROCESSOR_DEFAULT64 PROCESSOR_POWER8 > gcc/config/rs6000/linux64.h:#define PROCESSOR_DEFAULT PROCESSOR_POWER7 > gcc/config/rs6000/linux64.h:#define PROCESSOR_DEFAULT64 PROCESSOR_POWER8 > gcc/config/rs6000/rs6000.h:#define PROCESSOR_DEFAULT PROCESSOR_PPC603 > gcc/config/rs6000/rs6000.h:#define PROCESSOR_DEFAULT64 PROCESSOR_RS64A > gcc/config/rs6000/vxworks.h:#define PROCESSOR_DEFAULT PROCESSOR_PPC604 > > , and they are unlikely to be updated later, no? In most cases did would be an ABI change. Almost never an acceptable thing to do. Segher
Re: [PATCH 0/2 V2] aarch64: Place target independent and dependent code in one file.
On Thu, Feb 22, 2024 at 07:49:20PM +, Richard Sandiford wrote: > Thanks for the update. This is still quite hard to review though. > Sorry to ask for another round, but could you split it up further? > The ideal thing would be if patches that move code do nothing other > than move code, and if patches that change code do those changes > in-place. In general, if there is a (big) part to the patch that does not change behaviour at all, it should be a separate patch. Such a patch is then easy to review (write down in the commit message that it does not change behaviour though, it helps reviewers). It also makes the remaining tiny patches much easier to review then. Very generally, any patch that makes interesting changes should not have more than a few lines semantic content. That can be repeated of course, and have fall-out mechanical follow-up changes, but that's the essence of good patchsets: one change per patch. And then the commit message can be simple as well, and the chanegelog will be easy to write. That is the litmus test for good patch series :-) Segher
Re: [PATCH] rs6000: Update instruction counts due to combine changes [PR112103]
On Tue, Feb 20, 2024 at 01:49:30PM -0600, Peter Bergner wrote: > I think this will become less fragile after we fix PR114004 which is You call it "fragile". I call it the testcase found the exact kind of bug this testcase was meant to find! Yes, the test should become quieter when the compiler has fewer bugs :-) Segher
Re: [PATCH] rs6000: Neuter option -mpower{8,9}-vector [PR109987]
On Tue, Feb 20, 2024 at 05:27:07PM +0800, Kewen.Lin wrote: > > -mcpu=power8 implies -mvsx already. > > Yes, but users can specify -mno-vsx in RUNTESTFLAGS, dejagnu > framework can have different behaviors (options order) for > different versions, this explicit -mvsx is mainly for the > consistency between the checking and the actual testing. It is not supported at all. It is better to assume users do not try to hang themselves. > > It is mostly a testsuite patch, and testcase patches are fine (and much > > wanted!) in stage 4. The actual compiler options remain, and behaviour > > does not change for anyone who used the option as intended, > > Yes, excepting for one unexpected use that users having one cpu type which > doesn't support power8/power9 capability but meanwhile specifies option > -mpower{8,9}-vector to gain power8/power9 capability (as currently these > options can enable the corresponding flags). But I don't think it's an > expected use case. Yeah, it is why we do not want such options at all :-) > >>* config/rs6000/rs6000.opt: Make option power{8,9}-vector as > >>WarnRemoved. > > > > Do we want this, or do we want it silent? Should we remove the options > > later, if we now warn for it? > > Good question, it mainly follows the practice of option direct-move here. > IMHO at least for power8-vector we want WarnRemoved for now as it's > documented before, and we can probably make it (or them) removed later on > trunk once all active branch releases don't support it any more. > > What's your opinion on this? Originally I did Warn(%qs is deprecated) which already was a mistake. It then changed to Deprecated and then to WarnRemoved which make it clearer that it is a bad plan. If it is okay to remove an option, we should not talk about it at all anymore. Well maybe warn about it for another release or so, but not longer. > >> (define_register_constraint "we" > >> "rs6000_constraints[RS6000_CONSTRAINT_we]" > >> - "@internal Like @code{wa}, if @option{-mpower9-vector} and > >> @option{-m64} are > >> - used; otherwise, @code{NO_REGS}.") > >> + "@internal Like @code{wa}, if the cpu type is power9 or up, meanwhile > >> + @option{-mvsx} and @option{-m64} are used; otherwise, @code{NO_REGS}.") > > > > "if this is a POWER9 or later and @option{-mvsx} and @option{-m64} are > > used". How clumsy. Maybe we should make the patterns that use "we" > > work without mtvsrdd as well? Hrm, they will still require 64-bit GPRs > > of course, unless we can do something tricky. > > > > We do not need the special constraint at all of course (we can add these > > conditions to all patterns that use it: all *two* patterns). So maybe > > that's what we should do :-) > > Not sure the original intention introducing it (Mike might know it best), but > removing it sounds doable. It is for mtvsrdd. > btw, it seems more than two patterns using it? > like (if I didn't miss something): > - vsx_concat_ > - vsx_splat__reg > - vsx_splat_v4si_di > - vsx_mov_64bit Yes, it isn't clear we should use this contraint in those last two. It looks like those do not even need the restriction to 64 bit systems. Well the last one obviously has that already, but then it could just use "wa", no? > > -mcpu=power8 implies -mvsx (power7 already). You can disable VSX, or > > VMX as well, but by default it is enabled. > > Yes, it's meant to consider the explicitly -mno-vsx, which suffers the option > order issue. But considering we raise error for -mno-vsx -mpower{8,9}-vector > before, without specifying -mvsx is closer to the previous. > > I'll adjust it and the below similar ones, thanks! It is never supported to do unsupported things :-) We need to be able to rely on defaults. Otherwise, we will have to implement all of GCC recursively, in itself, in the testsuite, and in individual tests. Let's not :-) Cheers, Segher
Re: [PATCH] rs6000: Neuter option -mpower{8,9}-vector [PR109987]
Hi! On Tue, Jan 16, 2024 at 10:50:01AM +0800, Kewen.Lin wrote: > As PR109987 and its duplicated bugs show, -mno-power8-vector > (and -mno-power9-vector) cause some problems and as Segher > pointed out in [1] they are workaround options, so this patch > is to remove -m{no,}-power{8,9}-options. Excellent :-) > Like what we did > for option -mdirect-move before, this patch still keep the > corresponding internal flags and they are automatically set > based on -mcpu. Yup. That makes the code nicer, and it what we already have anyway! > The test suite update takes some efforts, Yeah :-/ > it consists of some aspects: > - effective target powerpc_p{8,9}vector_ok are removed > and replaced with powerpc_vsx_ok. So all such testcases already arrange to have p8 or p9 some other way? > - Some cases having -mpower{8,9}-vector are updated with > -mvsx, some of them already have -mdejagnu-cpu. For > those that don't have -mdejagnu-cpu, if -mdejagnu-cpu > is needed for the test point, then it's appended; > otherwise, add additional-options -mdejagnu-cpu=power{8,9} > if has_arch_pwr{8,9} isn't satisfied. Yeah it's a judgement call every time. > - Some test cases are updated with explicit -mvsx. > - Some test cases with those two option mixed are adjusted > to keep the test points, like -mpower8-vector > -mno-power9-vector are updated with -mdejagnu-cpu=power8 > -mvsx etc. -mcpu=power8 implies -mvsx already. > - Some test cases with -mno-power{8,9}-vector are updated > by replacing -mno-power{8,9}-vector with -mno-vsx, or > just removing it. Okay. > - For some cases, we don't always specify -mdejagnu-cpu to > avoid to restrict the testing coverage, it would check > has_arch_pwr{8,9} and appended that as need. That is in general how all tests should be. Very sometimes we want to test for a specific CPU, for a regression test that exhibited just on a certain CPU for example. But we should never have a -mcpu= (or a -mpowerN-vector nastiness thing) to test things on a new CPU! Just do a testsuite ruyn with *that* CPU. Not many years from now, *all* CPUs will have those new instructions anyway, so let's not put noise in the testcases that will be irrelevant soon. > - For vect test cases run, it doesn't specify -mcpu=power9 > for power10 and up. > > Bootstrapped and regtested on: > - powerpc64-linux-gnu P7/P8/P9 {-m32,-m64} > - powerpc64le-linux-gnu P8/P9/P10 In general it is nice to test 970 as the lowest vector thing we have, abnd/or p4 as a target without anything vector, as well. But I expect thoise will just work for this patch :-) > Although it's stage4 now, as the discussion in PR113115 we > are still eager to neuter these two options. It is mostly a testsuite patch, and testcase patches are fine (and much wanted!) in stage 4. The actual compiler options remain, and behaviour does not change for anyone who used the option as intended, Okay for trunk. Thanks! Comments below: > * config/rs6000/rs6000.opt: Make option power{8,9}-vector as > WarnRemoved. Do we want this, or do we want it silent? Should we remove the options later, if we now warn for it? > (define_register_constraint "we" "rs6000_constraints[RS6000_CONSTRAINT_we]" > - "@internal Like @code{wa}, if @option{-mpower9-vector} and @option{-m64} > are > - used; otherwise, @code{NO_REGS}.") > + "@internal Like @code{wa}, if the cpu type is power9 or up, meanwhile > + @option{-mvsx} and @option{-m64} are used; otherwise, @code{NO_REGS}.") "if this is a POWER9 or later and @option{-mvsx} and @option{-m64} are used". How clumsy. Maybe we should make the patterns that use "we" work without mtvsrdd as well? Hrm, they will still require 64-bit GPRs of course, unless we can do something tricky. We do not need the special constraint at all of course (we can add these conditions to all patterns that use it: all *two* patterns). So maybe that's what we should do :-) > -If you use the ISA 3.0 instruction set (@option{-mpower9-vector} or > -@option{-mcpu=power9}) on a 64-bit system, the IEEE 128-bit floating > -point support will also enable the generation of ISA 3.0 IEEE 128-bit > -floating point instructions. Otherwise, if you do not specify to > -generate ISA 3.0 instructions or you are targeting a 32-bit big endian > -system, IEEE 128-bit floating point will be done with software > -emulation. > +If you use the ISA 3.0 instruction set (@option{-mcpu=power9}) on a > +64-bit system, the IEEE 128-bit floating point support will also enable > +the generation of ISA 3.0 IEEE 128-bit floating point instructions. > +Otherwise, if you do not specify to generate ISA 3.0 instructions or you > +are targeting a 32-bit big endian system, IEEE 128-bit floating point > +will be done with software emulation. You do not need to reformat documentation source code: it is automatically formatted in all output formats and all viewers :-) > diff --git
Re: [PATCH] Turn on LRA on all targets
On Fri, Feb 16, 2024 at 11:34:55AM +, Maciej W. Rozycki wrote: > Not really, in particular because EH unwinding has to be reliable and > heuristics inherently is not. Yup. Which is why I did 0359465c703a for rs6000 six years ago (how time flies!) The commit message for that includes To find out where on-entry register values live at any point in a program, GDB currently tries to parse to parse the executable code. This does not work very well, for example it gets confused if some accesses to the stack use the frame pointer (r31) and some use the stack pointer (r1). A symptom is that backtraces can be cut short. and the patch does + /* By default, always emit DWARF-2 unwind info. This allows debugging + without maintaining a stack frame back-chain. It also allows the + debugger to find out where on-entry register values are stored at any + point in a function, without having to analyze the executable code (which + isn't even possible to do in the general case). */ +#ifdef OBJECT_FORMAT_ELF + opts->x_flag_asynchronous_unwind_tables = 1; +#endif We went through very many refinements of the heuristics through the years, but at some point you just have to give up: heuristics never can make up for missing information. > Consequently the more aggressive the compiler has become to schedule > function body instructions within a function's prologue the more lost the > machine code interpreter has become. Ultimately it would have to become a > full-fledged CPU simulator to do its heuristics. Yup, exactly. > In reality it means the > unwinder may fail to produce acceptable results, which will happen at any > frequency between hardly ever to most often, depending on the exact > circumstances. Yes. If the compiler optimises the *logue code well, there is no way heuristics can follow that. > Conversely no heuristics is required to unwind VAX frames, because they > are fixed in layout by hardware, fully self-described, and with the > hardware frame pointer always available. The downside of the VAX situation of course is that the compiler has no freedom to optimise the frame and *logue code at all, let alone well. This may not matter so much on narrow ucoded in-order machines, there are different balances there :-) Segher
Re: [PATCH] Turn on LRA on all targets
On Thu, Feb 15, 2024 at 08:41:42PM -0500, Paul Koning wrote: > > On Feb 15, 2024, at 5:56 PM, Segher Boessenkool > > wrote: > > > > On Thu, Feb 15, 2024 at 07:34:32PM +, Sam James wrote: > >> I have now started doing this in PR113932. > > > > Thank you! > > > > Segher > > Presumably this isn't for version 14 since it's in a late stage, right? I > have my bits about ready to go in but I'll wait for State 1 to open. Correct? Absolutely. It was decided early in stage 1 that this wasn't going to happen for 14. It appears most of the anywhere near hard targets have not done anything though. I'll just push very hard for 15. But you will be fine :-) Segher
Re: [PATCH] Turn on LRA on all targets
On Thu, Feb 15, 2024 at 07:34:32PM +, Sam James wrote: > I have now started doing this in PR113932. Thank you! Segher
Re: Repost [PATCH 1/6] Add -mcpu=future
On Fri, Jan 05, 2024 at 06:35:37PM -0500, Michael Meissner wrote: > * config/rs6000/rs6000.opt (-mfuture): New undocumented debug switch. No. Never ever use a flag that does what -mcpu= should do. We're still trying to recover from previous such mistakes. Don't add more please. > +++ b/gcc/config/rs6000/rs6000-c.cc > @@ -447,6 +447,8 @@ rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT > flags) > rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR9"); >if ((flags & OPTION_MASK_POWER10) != 0) > rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR10"); > + if ((flags & OPTION_MASK_FUTURE) != 0) > +rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR_FUTURE"); if a & B) != 0) != 0) != 0) ? You can do just if (a & B) Yes, existing code already does the silly thing, but just fix it then, don't add more :-) (And no if ((a & B)) either please). > +static int > +rs600_cpu_index_lookup (enum processor_type processor) > +{ > + for (size_t i = 0; i < ARRAY_SIZE (processor_target_table); i++) > +if (processor_target_table[i].processor == processor) > + return i; > + > + return -1; > +} "int i" please, not "size_t". This has nothing to do with object sizes. The loop counter will always be a small number. > + /* At the moment, we don't have explict -mtune=future support. If the user "At the moment" is out of date almost as soon as you write it. It is better to avoid such terms ;-) > + explicitly tried to use -mtune=future, give a warning. If not, use the > + power10 tuning until future tuning is added. */ There should be Power11 tuning now, please use that? So please post this -- as a separate series, and not as a single patch -- after fixing the things Ke Wen pointed out. Thanks! Segher
Re: Repost [PATCH 1/6] Add -mcpu=future
On Wed, Feb 07, 2024 at 05:21:10PM +0800, Kewen.Lin wrote: > on 2024/2/6 14:01, Michael Meissner wrote: > > It was more as a separation. The MPCCORE, CELL, PPCA2, and TITAN are rather > > old processors. I'll probably remove Titan soonish, btw. We have adjusted code around it for what, fifteen years? But the hardware never materialized. There are more silly things in our backend, but this one takes the prize. > OK, considering we only get this warning once for a simple case, I'm inclined > not to keep a static variable for it, it's the same as what we do currently > for option conflict errors emission. But I'm fine for either. Whatever is easiest. Segher
Re: Repost [PATCH 1/6] Add -mcpu=future
On Tue, Feb 06, 2024 at 01:01:52AM -0500, Michael Meissner wrote: > > Nit: Named as "ISA_FUTURE_MASKS_SERVER" seems more accurate as it's > > constituted > > with ISA_3_1_MASKS_**SERVER** ... > > Well the _SERVER stuff was due to the power7 days when we still had to support > the E500 in the main rs6000 tree. But I will change it to be more consistant > in the future patches. "_SERVER" still is a good shortish name for the server systems ;-) > > > @@ -67,7 +67,9 @@ enum processor_type > > > PROCESSOR_MPCCORE, > > > PROCESSOR_CELL, > > > PROCESSOR_PPCA2, > > > - PROCESSOR_TITAN > > > + PROCESSOR_TITAN, > > > + > > > > Nit: unintentional empty line? > > > > > + PROCESSOR_FUTURE > > > }; > > It was more as a separation. The MPCCORE, CELL, PPCA2, and TITAN are rather > old processors. I don't recall why we kept them after the POWER. Please don't add random separations. > Logically we should re-order the list and move MPCCORE, etc. earlier, but I > will delete the blank line in future patches. Don't randomly reorder, either. _FUTURE should be added after POWER11. > > I think we should also update asm_names in driver-rs6000.cc. > > Ok. Though the driver-rs6000.cc stuff won't kick in until we have a real > system that matches "future". Or when during development you have that faked. You did test it, right? :-) Segher
Re: Repost [PATCH 0/6] PowerPC Future patches
Hi! On Fri, Jan 05, 2024 at 06:27:05PM -0500, Michael Meissner wrote: > In the current MMA subsystem for Power10, there are 8 512-bit accumulator > registers. These accumulators are each tied to sets of 4 FPR registers. When Four VSX registers -- the FP registers are only a 64 bit part of each of those. Please do not call those VSX registers "FPRs". They are not. > These patches add support for the 512-bit accumulators within the dense math > system, and for allocation of the 1,024-bit DMRs. At this time, no additional > built-in functions will be done to support any dense math features other than > doing data movement between the DMRs and the VSX registers. Before we can > look > at adding any new dense math support other than data movement, we need the GCC > compiler to be able to allocate and use these DMRs. Okido. > If you compile with -mcpu=power10, the wD constraint will match the equivalent > FPR register that overlaps with the accumulator. If you compile with > -mcpu=future, the wD constraint will match the DMR register and not the FPR > register. > > These patches also modifies the print_operand %A output modifier to print out > DMR register numbers if -mcpu=future, and continue to print out the FPR > register number divided by 4 for -mcpu=power10. Yup. Unfortunately that is the best we can do probably. It _feels_ fragile, but it wil probably be okay in practice. > Going forward, hopefully if you modify your code to use the wD constraint and > %A output modifier, you can write code that switches more easily between the > two systems. But it will never become completely transparent. Luckily the old thing will over time fade into the background. So, please post the -mcpu=future patches in a separate series, first. I'll comment on that patch in a minute, you'll probably want to take those comments into consideration before posting that series ;-) Segher
Re: [PATCH] Add a late-combine pass [PR106594]
Hi! On Tue, Oct 24, 2023 at 07:49:10PM +0100, Richard Sandiford wrote: > This patch adds a combine pass that runs late in the pipeline. But it is not. It is a completely new thing, and much closer to fwprop than to combine, too. Could you rename it to something else, please? Something less confusing to both users and maintainers :-) > There are two instances: one between combine and split1, So, what kind of things does this do that the real combine does not? And, same question but for fwprop. That would be the crucial motivation for why we want to have this new pass at all :-) > The pass currently has a single objective: remove definitions by > substituting into all uses. The easy case ;-) Segher
Re: [PATCH 1/2] RTX_COST: Count instructions
On Fri, Dec 29, 2023 at 09:14:52PM -0700, Jeff Law wrote: > On 12/29/23 10:46, YunQiang Su wrote: > >When we try to combine RTLs, the result may be very complex, > >and `rtx_cost` may think that it need lots of costs. But in > >fact, it may match a pattern in machine descriptions, which > >may emit only 1 or 2 hardware instructions. This combination > >may be refused due to cost comparison failure. > Then that's a problem with the backend's implementation of RTX_COST. > > >Since the high cost may be due to a more expsensive operation. > >To get real reason, we also need information about instruction > >count. > Then cost the *operations*, not the number of instructions. Also note > that a single insn may generate multiple assembler instructions. > > Even with all its warts, the real solution here is to fix the port's RTX > costs. Or implement the insn_cost hook instead, it will be used preferably over rtx_costs in most places then. Including in the combiner. insn_cost is much easier to implement, and even possible to make good cost estimates with :-) Segher
Re: [PATCH] combine: Fix ICE in try_combine on pr112494.c [PR112560]
Hi! On Wed, Nov 29, 2023 at 02:20:03PM +0100, Uros Bizjak wrote: > On Wed, Nov 29, 2023 at 1:25 PM Richard Biener > wrote: > > On Wed, Nov 29, 2023 at 10:35 AM Uros Bizjak wrote: > I was assuming that if the CC reg is not used inside the comparison, > then the mode of CC reg is irrelevant. We can still combine the > instructions into new insn, without updating the use of CC reg. It should never happen that the CC reg is not used, so what does this mean? Where it is used might not be a comparison of course, as in your example. > && (cc_use_loc = find_single_use (SET_DEST (newpat), i3, > _use_insn))) > { > - compare_code = orig_compare_code = GET_CODE (*cc_use_loc); > - if (is_a (GET_MODE (i2dest), )) > - compare_code = simplify_compare_const (compare_code, mode, > -, ); > - target_canonicalize_comparison (_code, , , 1); > + if (COMPARISON_P (*cc_use_loc)) > + { > + compare_code = orig_compare_code = GET_CODE (*cc_use_loc); > + if (is_a (GET_MODE (i2dest), )) > + compare_code = simplify_compare_const (compare_code, mode, > +, ); > + target_canonicalize_comparison (_code, , , 1); > + } > + else > + { > + if (dump_file && (dump_flags & TDF_DETAILS)) > + fprintf (dump_file, "CC register not used in comparison.\n"); "Where the CCmode register is used is not a comparison". But more compact if you can manage (I cannot). Segher
Re: [PATCH] rs6000, Add missing overloaded bcd builtin tests
On Tue, Oct 31, 2023 at 08:31:25AM -0700, Carl Love wrote: > > I just found that actually they have the test coverage, because we > > have > > > > #define __builtin_bcdcmpeq(a,b) __builtin_vec_bcdsub_eq(a,b,0) > > #define __builtin_bcdcmpgt(a,b) __builtin_vec_bcdsub_gt(a,b,0) > > #define __builtin_bcdcmplt(a,b) __builtin_vec_bcdsub_lt(a,b,0) > > #define __builtin_bcdcmpge(a,b) __builtin_vec_bcdsub_ge(a,b,0) > > #define __builtin_bcdcmple(a,b) __builtin_vec_bcdsub_le(a,b,0) > > > > in altivec.h and gcc/testsuite/gcc.target/powerpc/bcd-4.c tests all > > these > > OK, my simple scripts are not going to pickup the stuff in altivec.h. > They were just grepping for the built-in name in the test file > directory. You could use gcov to see which rs6000 builtins are not exercised by anything in the testsuite, maybe. This probably can be automated pretty nicely. Segher
Re: [PATCH v2] swap: Fix incorrect lane extraction by vec_extract() [PR106770]
Hi! Please say "rs6000/p8swap:" in the subject, not "swap:" :-) On Sun, Sep 10, 2023 at 10:58:32PM +0530, Surya Kumari Jangala wrote: > Another issue with always handling swappable instructions is that it is > incorrect to do so in webs where loads/stores on quad word aligned > addresses are changed to lvx/stvx. Why? Please say why in the commit message (the message you send with your patch should be the exact eventual commit message!) > gcc/ > PR rtl-optimization/PR106770 > * config/rs6000/rs6000-p8swap.cc (non_permuting_mem_insn): New > function. Please don't break commit message / changelog lines early unnecessarily. Lines are 80 chars, the leading tab counts as 8. > + /* Set if the swappable insns in the web represented by this entry > + have to be fixed. Swappable insns have to be fixed in : (no space before colon) > +static bool > +non_permuting_mem_insn (swap_web_entry *insn_entry, unsigned int i) > +{ > + return (insn_entry[i].special_handling == SH_NOSWAP_LD || > + insn_entry[i].special_handling == SH_NOSWAP_ST); > +} "return" is not a function, you don't need parens here. > +/* Convert a non-permuting load/store insn to a permuting one. */ > +static void > +handle_non_permuting_mem_insn (swap_web_entry *insn_entry, unsigned int i) A better name would be good, "handle" is a weaselword :-) It is a static, so a shorter name is completely acceptable (say, one that wouldn't be acceptable with bigger than file scope). > + rtx_insn *insn = insn_entry[i].insn; > + if (insn_entry[i].special_handling == SH_NOSWAP_LD) > +permute_load (insn); > + else if (insn_entry[i].special_handling == SH_NOSWAP_ST) > +permute_store (insn); Lose the "else"? The compiler can do micro-optimisations a million times better than any user could. Simpler, more readable (better understandable!) code is much preferred. > + /* Perform special handling for swappable insns that require it. That is a completely contentless sentence :-( > + Note that special handling should be done only for those > + swappable insns that are present in webs marked as requiring > + special handling. */ This one isn't much better. > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr106770.c > @@ -0,0 +1,21 @@ > +/* { dg-do compile } */ This is the default, you do not need this. > +/* { dg-require-effective-target powerpc_p8vector_ok } */ > +/* { dg-options "-mdejagnu-cpu=power8 -O2 " } */ > +/* The 2 xxpermdi instructions are generated by the two > + calls to vec_promote() */ > +/* { dg-final { scan-assembler-times "xxpermdi" 2 } } */ Please enclose in {}. Use double quotes in Tcl only when tou want the interpolation they cause. Default to using {} instead. So please fix those things, and write a better commit message. Ideally the commit messsage will tell everything needed to understand the patch (so also to review the patch). Maybe add examples where needed. So reviewing the code in the patch should be an easy thing to do, after reading the commit message :-) Segher
Re: [PATCH] Cleanup: Replace UNSPEC_COPYSIGN with copysign RTL
On Fri, Sep 29, 2023 at 02:09:12PM -0400, Michael Meissner wrote: > * config/rs6000/rs6000.md (UNSPEC_COPYSIGN): Delete. > (copysign3_fcpsg): Use copysign RTL instead of UNSPEC. (typo, it is _fcpsgn) Nice to see unnecessary unspecs going away :-) Segher
Re: [PATCH V4, rs6000] Disable generation of scalar modulo instructions
Hi! On Fri, Jun 30, 2023 at 02:26:35PM -0500, Pat Haugen wrote: > gcc/ > * config/rs6000/rs6000.cc (rs6000_rtx_costs): Check if disabling > scalar modulo. "Check whether the modulo instruction is disabled?" > * config/rs6000/rs6000.md (mod3, *mod3): Disable. > (define_expand umod3): New. > (define_insn umod3): Rename to *umod3 and disable. > (umodti3, modti3): Disable. None of these patterns are disabled! Instead, the new TARGET_* thing is used. > +/* Disable generation of scalar modulo instructions due to performance issues > + with certain input values. This can be removed in the future when the > + issues have been resolved. */ > +#define RS6000_DISABLE_SCALAR_MODULO 1 I think that is a bit optimistic -- in the future we will still want to support older cores ;-) > - "TARGET_POWER10 && TARGET_POWERPC64" > + "TARGET_POWER10 && TARGET_POWERPC64 && !RS6000_DISABLE_SCALAR_MODULO" >"vmoduq %0,%1,%2" Did we ever test if this insn in fact is slower as well? I don't mean either way, orthogonality is good, but just for my enlightenment. > +++ b/gcc/testsuite/gcc.target/powerpc/clone1.c > +/* { Fail due to RS6000_DISABLE_SCALAR_MODULO. */ Xfail, but heh. No need to change that. > +/* { dg-final { scan-assembler-times {\mdivd\M} 1 { xfail *-*-* } } } */ > +/* { Fail due to RS6000_DISABLE_SCALAR_MODULO. */ > +/* { dg-final { scan-assembler-times {\mmodsw\M} 1 { xfail *-*-* } } } */ Thanks for the \m \M, it is much harder to determine if the tests actually work, without that :-) With improved changelog: okay for trunk. Okay for all backports as well (after some soak time). Thank you! Segher
Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
On Thu, Sep 07, 2023 at 02:23:00PM +0300, Dan Carpenter wrote: > On Thu, Sep 07, 2023 at 06:04:09AM -0500, Segher Boessenkool wrote: > > On Thu, Sep 07, 2023 at 12:48:25PM +0300, Dan Carpenter via Gcc-patches > > wrote: > > > I started to hunt > > > down all the Makefile which add a -Werror but there are a lot and > > > eventually I got bored and gave up. > > > > I have a patch stack for that, since 2014 or so. I build Linux with > > unreleased GCC versions all the time, so pretty much any new warning is > > fatal if you unwisely use -Werror. > > > > > Someone should patch GCC so there it checks an environment variable to > > > ignore -Werror. Somethine like this? > > > > No. You should patch your program, instead. > > There are 2930 Makefiles in the kernel source. Yes. And you need patches to about thirty. Or a bit more, if you want to do it more cleanly. This isn't a guess. > > One easy way is to add a > > -Wno-error at the end of your command lines. Or even just -w if you > > want or need a bigger hammer. > > I tried that. Some of the Makefiles check an environemnt variable as > well if you want to turn off -Werror. It's not a complete solution at > all. I have no idea what a complete solution looks like because I gave > up. A solution can not involve changing the compiler. That is just saying the kernel doesn't know how to fix its own problems, so let's give the compiler some more unnecessary problems. > > Or nicer, put it all in Kconfig, like powerpc already has for example. > > There is a CONFIG_WERROR as well, so maybe use that in all places? > > That's a good idea but I'm trying to compile old kernels and not the > current kernel. You can patch older kernels, too, you know :-) If you need to not make any changes to your source code for some crazy reason (political perhaps?), just use a shell script or shell function instead of invoking the compiler driver directly? Segher Segher
Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
On Thu, Sep 07, 2023 at 07:22:45AM -0400, Steven Rostedt wrote: > On Thu, 7 Sep 2023 06:04:09 -0500 > Segher Boessenkool wrote: > > On Thu, Sep 07, 2023 at 12:48:25PM +0300, Dan Carpenter via Gcc-patches > > wrote: > > No. You should patch your program, instead. One easy way is to add a > > -Wno-error at the end of your command lines. Or even just -w if you > > want or need a bigger hammer. > > That's not really possible when bisecting a kernel bug into older kernels. > The build system is highly complex and requires hundreds of changes to do > what you suggested. As it is for a bisection that takes a minimum of 13 > iterations, your approach just isn't feasible. Isn't this exactly what KCFLAGS is for? But, I meant to edit the build system. It isn't so hard to bisect with patch stacks on top. Just a bit annoying. Segher
Re: [MAINTAINERS/KERNEL SUMMIT] Trust and maintenance of file systems
On Thu, Sep 07, 2023 at 12:48:25PM +0300, Dan Carpenter via Gcc-patches wrote: > I started to hunt > down all the Makefile which add a -Werror but there are a lot and > eventually I got bored and gave up. I have a patch stack for that, since 2014 or so. I build Linux with unreleased GCC versions all the time, so pretty much any new warning is fatal if you unwisely use -Werror. > Someone should patch GCC so there it checks an environment variable to > ignore -Werror. Somethine like this? No. You should patch your program, instead. One easy way is to add a -Wno-error at the end of your command lines. Or even just -w if you want or need a bigger hammer. Or nicer, put it all in Kconfig, like powerpc already has for example. There is a CONFIG_WERROR as well, so maybe use that in all places? > +static bool > +ignore_w_error(void) > +{ > + char *str; > + > + str = getenv("IGNORE_WERROR"); > + if (str && strcmp(str, "1") == 0) space before ( > case OPT_Werror: > + if (ignore_w_error()) > + break; >dc->warning_as_error_requested = value; >break; > > case OPT_Werror_: > - if (lang_mask == CL_DRIVER) > + if (ignore_w_error()) > + break; > + if (lang_mask == CL_DRIVER) > break; The new indentation is messed up. And please don't move the existing early-out to later, it make more sense earlier, the way it was. Segher
Re: [PATCH] Fix typo in insn name.
Hi! On Mon, Jul 10, 2023 at 03:59:44PM -0400, Michael Meissner wrote: > In doing other work, I noticed that there was an insn: > > vsx_extract_v4sf__load > > Which did not have an iterator. I removed the useless . This patch does that, you mean. > --- a/gcc/config/rs6000/vsx.md > +++ b/gcc/config/rs6000/vsx.md > @@ -3576,7 +3576,7 @@ (define_insn_and_split "vsx_extract_v4sf" >[(set_attr "length" "8") > (set_attr "type" "fp")]) > > -(define_insn_and_split "*vsx_extract_v4sf__load" > +(define_insn_and_split "*vsx_extract_v4sf_load" >[(set (match_operand:SF 0 "register_operand" "=f,v,v,?r") > (vec_select:SF >(match_operand:V4SF 1 "memory_operand" "m,Z,m,m") Does this fix any ICEs? Or do you have some example that makes better machine code after this change? Or would a better change perhaps be to just remove this pattern completely, if it doesn't do anything useful? I.e., please include a new testcase. Segher
Re: [PATCH] rs6000: Don't ICE when generating vector pair load/store insns [PR110411]
On Thu, Jul 06, 2023 at 02:48:19PM -0500, Peter Bergner wrote: > On 7/6/23 12:33 PM, Segher Boessenkool wrote: > > On Wed, Jul 05, 2023 at 05:21:18PM +0530, P Jeevitha wrote: > >> --- a/gcc/config/rs6000/rs6000.cc > >> +++ b/gcc/config/rs6000/rs6000.cc > >> @@ -9894,6 +9894,8 @@ rs6000_legitimate_address_p (machine_mode mode, rtx > >> x, bool reg_ok_strict) > >> > >>/* Handle unaligned altivec lvx/stvx type addresses. */ > >>if (VECTOR_MEM_ALTIVEC_OR_VSX_P (mode) > >> + && mode != OOmode > >> + && mode != XOmode > >>&& GET_CODE (x) == AND > >>&& CONST_INT_P (XEXP (x, 1)) > >>&& INTVAL (XEXP (x, 1)) == -16) > > > > Why do we need this for OOmode and XOmode here, but not for the other > > modes that are equally not allowed? That makes no sense. > > VECTOR_MEM_ALTIVEC_OR_VSX_P (mode) already filters those modes out > (eg, SImode, DFmode, etc.), just not OOmode and XOmode, since those both > are modes used in/with VSX registers. It does not filter anything out, no. That simply checks if a datum of that mode can be loaded into vector registers or not. For example SImode could very well be loaded into vector registers! (It just is not such a great idea). And for some reason there is VECTOR_P8_VECTOR as well, which is mixing multiple concepts already. Let's not add more, _please_. > > Should you check for anything that is more than a register, for example? > > If so, do *that*? > > Well rs6000_legitimate_address_p() is only passed the MEM rtx, so we have > no idea if this is a load or store, so we're clueless on number of regs > needed to hold this mode. The best we could do is something like That is *bigger than* a register. It's the same in Dutch, sorry, I am tired :-( > GET_MODE_SIZE (mode) == GET_MODE_SIZE (V16QImode) > > or some such thing. Would you prefer something like that? That is even worse :-( > >> --- /dev/null > >> +++ b/gcc/testsuite/gcc.target/powerpc/pr110411.c > >> @@ -0,0 +1,21 @@ > >> +/* PR target/110411 */ > >> +/* { dg-options "-O2 -mdejagnu-cpu=power10 -S -mblock-ops-vector-pair" } > >> */ > > > > -S in testcases is wrong. Why do you want this? It is *good* if this > > is hauled through the assembler as well! If you *really* want this you > > use "dg-do assemble", but you shouldn't. > > For test cases checking for ICEs, we don't need to assemble, so I agree, > we just need to remove the -S option, which is implied by this being a > dg-do compile test case (the default for this test directory). We *do* want to assemble. It is a general principle that we want to test as much as possible whenever possible. *Most* problems are found with the help of testcases that were never designed for the problem found! dg-do compile *does* invoke the assembler, btw. As it should. Segher
Re: [PATCH] rs6000: Don't ICE when generating vector pair load/store insns [PR110411]
Hi! On Wed, Jul 05, 2023 at 05:21:18PM +0530, P Jeevitha wrote: > The following patch has been bootstrapped and regtested on powerpc64le-linux. > > while generating vector pairs of load & store instruction, the src address > was treated as an altivec type and that type of address is invalid for > lxvp and stxvp insns. The solution for this is to avoid altivec type address > for OOmode and XOmode. The mail message you send should be what will end up in the Git commit message. Your lines are too long for that (and the subject is much too long btw), and the content isn't right either. Maybe something like """ rs6000: Don't allow OOmode or XOmode in AltiVec addresses (PR110411) There are no instructions that do traditional AltiVec addresses (i.e. with the low four bits of the address masked off) for OOmode and XOmode objects. Don't allow those in rs6000_legitimate_address_p. """ > gcc/ > PR target/110411 > * config/rs6000/rs6000.cc (rs6000_legitimate_address_p): Avoid altivec > address for OOmode and XOmde. (XOmode, sp.) Not "avoid", disallow. If you avoid something you still allow it, you just prefer to see something else. > --- a/gcc/config/rs6000/rs6000.cc > +++ b/gcc/config/rs6000/rs6000.cc > @@ -9894,6 +9894,8 @@ rs6000_legitimate_address_p (machine_mode mode, rtx x, > bool reg_ok_strict) > >/* Handle unaligned altivec lvx/stvx type addresses. */ >if (VECTOR_MEM_ALTIVEC_OR_VSX_P (mode) > + && mode != OOmode > + && mode != XOmode >&& GET_CODE (x) == AND >&& CONST_INT_P (XEXP (x, 1)) >&& INTVAL (XEXP (x, 1)) == -16) Why do we need this for OOmode and XOmode here, but not for the other modes that are equally not allowed? That makes no sense. Should you check for anything that is more than a register, for example? If so, do *that*? > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr110411.c > @@ -0,0 +1,21 @@ > +/* PR target/110411 */ > +/* { dg-options "-O2 -mdejagnu-cpu=power10 -S -mblock-ops-vector-pair" } */ -S in testcases is wrong. Why do you want this? It is *good* if this is hauled through the assembler as well! If you *really* want this you use "dg-do assemble", but you shouldn't. Segher
Re: [PATCH, V6] Fix power10 fusion and -fstack-protector, PR target/105325
Hi! The patch looks great now, thanks you! But the commit message needs some work: First off, the subject, which is a short (50 character max!) summary of what the patch is about. Fix power10 fusion and -fstack-protector, PR target/105325 There is absolutely nothing to do with stack protector, it does not belong in the commit message at all, and certainly not in the subject! On Tue, Jun 13, 2023 at 10:14:02PM -0400, Michael Meissner wrote: > This patch fixes an issue where if you use the -fstack-protector and > -mcpu=power10 options and you have a large stack frame, the GCC compiler will > generate a LWA instruction with a large offset. That is not the core issue, it is just one example where things went wrong. That prompted this patch, sure, so you can talk about that ten or so lines down if you think it is important (I don't fwiw), but not at the start here. You should just say what was wrong, so people with a short attention span can just skip this patch when looking through git log (and even earlier if the subject would be good). Commit messages are for *future* users. Not for getting your patch approved. > Here is the initial fused initial insn that was created. It refers to the > stack location based off of the virtrual frame pointer: The soft frame pointer, not a virtual one. For PowerPC this is not a real register and LRA will eventually replace it, sure. "Virtual" here in GCC has a very specific meaning; virtual things are replaced very soon after expand. > When the split2 pass is run after reload has finished the ds_form_mem_operand > predicate that was used for lwa and ld no longer returns true. Yes. It is the wrong predicate to use here. *That* is the problem. > 2)Delete ds_form_mem_operand since it is no longer used. ... and we don't expect to use it any time soon. > 3)Use the "YZ" constraints for ld/lwa instead of "m". Yes, constraints and predicates. > * config/rs6000/genfusion.pl (gen_ld_cmpi_p10_one): Fix problems that > allowed prefixed lwa to be generated. You should not say what the *old* code did, in the changelog! > --- a/gcc/config/rs6000/genfusion.pl > +++ b/gcc/config/rs6000/genfusion.pl > @@ -61,20 +61,30 @@ sub gen_ld_cmpi_p10_one >my $mempred = "non_update_memory_operand"; >my $extend; > > + # We need to special case lwa. The prefixed_load_p function in rs6000.cc > + # (which determines if a load instruction is prefixed) uses the fact that > the > + # register mode is different from the memory mode, and that the sign_extend > + # attribute is set to use DS-form rules for the address instead of D-form. > + # If the register size is the same, prefixed_load_p assumes we are doing a > + # lwz. We change to use an lwz and word compare if we don't need to sign > + # extend the SImode value. Otherwise if we need the value, we need to > + # make sure the insn is marked as ds-form. > + my $lwa_insn = ($lmode eq "SI" && $ccmode eq "CC"); That is a pretty bad name, the variable does not hold an "insn" in any way, shape, or form. It is hardish to give it a good name because it mixes two questions into one variable? You can just repeat the tiny conditions wherever you use them, and the code would be more readable (and less cryptic!) > + if ($lwa_insn && $cmp_size eq "d") { Name it "cmp_size_char" maybe? "cmp_size" suggests a number. > --- /dev/null > +++ b/gcc/testsuite/g++.target/powerpc/pr105325.C > @@ -0,0 +1,26 @@ > +/* { dg-do assemble } */ > +/* { dg-require-effective-target lp64 } */ > +/* { dg-require-effective-target power10_ok } */ > +/* { dg-require-effective-target powerpc_prefixed_addr } */ > +/* { dg-options "-O2 -mdejagnu-cpu=power10 -fstack-protector" } */ > + > +/* Test that power10 fusion does not generate an LWA/CMPDI instruction pair > + instead of PLWZ/CMPWI. Ultimately the code was dying because the fusion > + load + compare -1/0/1 patterns did not handle the possibility that the > load > + might be prefixed. The -fstack-protector option is needed to show the > + bug. */ Mention the PR number somewhere in the text as well? For grep etc. Okay for trunk, with some more reasonable commmit message. Thank you! Also okay for all backports. Segher
Re: [PATCH V3 1/4] rs6000: build constant via li;rotldi
Hi! On Fri, Jun 16, 2023 at 04:34:12PM +0800, Jiufu Guo wrote: > +/* Check if value C can be built by 2 instructions: one is 'li', another is > + rotldi. > + > + If so, *SHIFT is set to the shift operand of rotldi(rldicl), and *MASK > + is set to -1, and return true. Return false otherwise. */ Don't say "is set to -1", the point of having this is so you say "is set to the "li" value". Just like you describe what SHIFT is for. > +static bool > +can_be_built_by_li_and_rotldi (HOST_WIDE_INT c, int *shift, > +HOST_WIDE_INT *mask) > +{ > + int n; Put shis later, like: > + /* Check if C can be rotated to a positive or negative value > + which 'li' instruction is able to load. */ int n; > + if (can_be_rotated_to_lowbits (c, 15, ) > + || can_be_rotated_to_lowbits (~c, 15, )) > +{ > + *mask = HOST_WIDE_INT_M1; > + *shift = HOST_BITS_PER_WIDE_INT - n; > + return true; > +} It is tricky to see ~c will always work, since what is really done is -c instead. Can you just use that here? > @@ -10266,15 +10291,14 @@ static void > rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c) > { >rtx temp; > + int shift; > + HOST_WIDE_INT mask; >HOST_WIDE_INT ud1, ud2, ud3, ud4; > >ud1 = c & 0x; > - c = c >> 16; > - ud2 = c & 0x; > - c = c >> 16; > - ud3 = c & 0x; > - c = c >> 16; > - ud4 = c & 0x; > + ud2 = (c >> 16) & 0x; > + ud3 = (c >> 32) & 0x; > + ud4 = (c >> 48) & 0x; > >if ((ud4 == 0x && ud3 == 0x && ud2 == 0x && (ud1 & 0x8000)) >|| (ud4 == 0 && ud3 == 0 && ud2 == 0 && ! (ud1 & 0x8000))) > @@ -10305,6 +10329,17 @@ rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT > c) >emit_move_insn (dest, gen_rtx_XOR (DImode, temp, >GEN_INT ((ud2 ^ 0x) << 16))); > } > + else if (can_be_built_by_li_and_rotldi (c, , )) > +{ > + temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode); > + unsigned HOST_WIDE_INT imm = (c | ~mask); > + imm = (imm >> shift) | (imm << (HOST_BITS_PER_WIDE_INT - shift)); > + > + emit_move_insn (temp, GEN_INT (imm)); > + if (shift != 0) > + temp = gen_rtx_ROTATE (DImode, temp, GEN_INT (shift)); > + emit_move_insn (dest, temp); > +} If you would rewrite so it isn't such a run-on thing with "else if", instead using early outs, or even some factoring, you could declare the variable used only in a tiny scope in that tiny scope instead. > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/const-build.c > @@ -0,0 +1,54 @@ > +/* { dg-do run } */ > +/* { dg-options "-O2 -save-temps" } */ > +/* { dg-require-effective-target has_arch_ppc64 } */ Please put a tiny comment here saying what this test is *for*? The file name is a bit of hint already, but you can indicate much more in one or two lines :-) With those adjustments, okay for trunk. Thanks! (If -c doesn't work, it needs more explanation). Segher
Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie
On Thu, Jun 15, 2023 at 03:00:40PM +0800, Jiufu Guo wrote: > >> This is the existing pattern. It may be read as an action > >> to clean an unknown-size memory block. > > > > Including a size zero memory block, yes. BLKmode was originally to do > > things like bcopy (before modern names like memcpy were more usually > > used), and those very much need size zero as well.h > > The size is possible to be zero. No asm code needs to > be generated for "set 'const_int 0' to zero size memory"". > stack_tie does not generate any real code. It seems ok :) > > While, it may not be zero size mem. This may be a concern. > This is one reason that I would like to have an unspec_tie. It very much *can* be a zero size mem, that is perfectly find for mem:BLK. > Another reason is unspec:blk is used but various ports :) unspec:BLK is undefined. BLKmode is allowed on mem only. > >> 2. "set (mem/c:BLK (reg/f:DI 1 1) unspec:blk (const_int 0 [0]) > >> UNSPEC_TIE". > >> Current patch is using this one. > > > > What would be the semantics of that? Just the same as the current stuff > > I'd say, or less? It cannot be more! > > The semantic that I trying to achieve is "this is a special > insn, not only a normal set to unknown size mem". What does that *mean*? "Special instruction"? What would what code do for that? What would the RTL mean? > As you explained before on 'unspec:DI', the unspec would > just decorate the set_src part: something DI value with > machine-specific operation. An unspec is an operation on its operands, giving some (in this case) DImode value. There is nothing special about that operation, it can be optimised like any other, it's just not specified what exactly that value is (to the generic compiler, the backend itself can very much optimise stuff with it). > But, since 'tie_operand' is checked for this insn. > If 'tie_operand' checks UNPSEC_TIE, then the insn > with UNPSEC_TIE is 'a special insn'. Or interpret > the semantic of this insn as: this insn stack_ite > indicates "set/operate a zero size block". tie_operand is a predicate. The predicate of an insn has to return 1, or the insn is not recognised. You can do the same in insn conditions always (in principle anyway). Segher
Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie
On Wed, Jun 14, 2023 at 06:25:10PM +0200, Richard Biener wrote: > > Form rs6000.md: > > ; This is to explain that changes to the stack pointer should > > ; not be moved over loads from or stores to stack memory. > > (define_insn "stack_tie" > > That suggests it’s the hard register value that‘s protected, not the memory > pointed to. I suppose that means an unspec volatile with the reg as input > would serve the same? No? It says what it says. That is pretty vague language, of course, not entirely by accident no doubt. > Or maybe that’s not the whole story. > > > > and from rs6000-logue.cc: > > /* This ties together stack memory (MEM with an alias set of > > frame_alias_set) > > and the change to the stack pointer. */ > > static void > > rs6000_emit_stack_tie (rtx fp, bool hard_frame_needed) > > I cannot make sense of that comment, but not sure if I really want to know … It really is the same thing: this is a bloody heavy hammer keeping the change to the stack pointer (or "hard" frame pointer) in place wrt any accesses to the stack memory. If there was a nice portable way to avoid needing this we haven't found it yet -- or a non-portable way even, and it doesn't have to be all that nice either come to think of it :-) Segher
Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie
Hi! On Wed, Jun 14, 2023 at 10:04:20AM +0100, Richard Sandiford wrote: > I'd also understood it to be either. As in, it is a may-clobber > that can be used for must-clobber. Alternatively: the value stored > is unpredictable, and can therefore be the same as the current value. Yes, it is a set with an unspecified RHS. > I think the main difference between: > > (clobber (mem:BLK …)) > > and > > (set (mem:BLK …) (unspec:BLK …)) > > is that the latter must happen for correctness (unless something > that understands the unspec proves otherwise) whereas a clobber > can validly be dropped. So for something like stack_tie, a set > seems more correct than a clobber. No, the latter can be removed as well, under exactly the same conditions: if no code after it reads what was written. This happens in branches marked dead. Segher
Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie
On Wed, Jun 14, 2023 at 09:22:09AM +, Richard Biener wrote: > How can a clobber be validly dropped? Same as any other set: if no code executed after it can read whatever is written. This typically means a stack frame goes away, or simply no more code is executed *at all* after this. > For the case of stack > memory if there's no stack use after it it could be elided > and I suppose the clobber itself can be moved. But then > the function return is a stack use as well. A function return does not access the stack at all on most architectures, including PowerPC. Some epilogue insns can do, of course, but we expand to separate insns during expand already. Segher
Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie
Hi! On Wed, Jun 14, 2023 at 09:52:37AM +, Richard Biener wrote: > I see. So > > (parallel > (unspec stack_tie) > (clobber (mem:BLK ...))) Written like this, without a "set", *every* unspec has to be an unspec_volatile, for the same reason as all inline asms without outputs always are considered volatile asm. The "unspec" arm of the parallel can be omitted, and if that is valid RTL (possibly after other changes, like omitting the parallel,replacing it by its one remaining arm), this is a prefectly valid transformation. > I suppose it needs to be an unspec_volatile? It feels like > the stack_ties are a delicate hack preventing enough but not too > much optimization ... Yes. So let's please not disturb it :-) It isn't a "delicate" hack I would say, but its effects are needed in some places, and messing it up leads to hard to debug problems. Which had happened time and time again over the years. It just is hard to deal with variable sized stack adjustments and the like. As long as we only use stack ties in such unusual cases, all is fine. There are worse things, like what we have the frame_pointer_needed_indeed thing for :-) Segher
Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie
Hi! On Wed, Jun 14, 2023 at 05:26:52PM +0800, Jiufu Guo wrote: > Richard Biener writes: > >> 3. "set (mem/c:DI (reg/f:DI 1 1) unspec:DI (const_int 0 [0]) > >> UNSPEC_TIE". > >>This avoids using BLK on unspec, but using DI. > > > > That gives the MEM a size which means we can interpret the (set ..) > > as killing a specific area of memory, enabling DSE of earlier > > stores. > > Oh, thanks! > While with 'unspec:DI', I'm wondering if it means this 'set' would > do some special things other than pure 'set' to the memory. No, that is not what unspec means. It just means "some DImode value I'm not telling you anything about". If to get that value there is some important work done (that should not be oprimised away, say) you need unspec_volatile, which means just that: there is an unspecified side effect done by that insn, so it has to be done on the real machine exactly like on the abstract C machine, so the insn has big restrictions on being moved and removed etc. We can replace the RHS of (almost) *every* set with an unspec, and the compiler would still work, just would generate pretty lousy code. But at least CSE and DSE (and everything else purely dataflow) would still work :-) Segher
Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie
Hi! On Wed, Jun 14, 2023 at 07:59:04AM +, Richard Biener wrote: > On Wed, 14 Jun 2023, Jiufu Guo wrote: > > 3. "set (mem/c:DI (reg/f:DI 1 1) unspec:DI (const_int 0 [0]) > > UNSPEC_TIE". > >This avoids using BLK on unspec, but using DI. > > That gives the MEM a size which means we can interpret the (set ..) > as killing a specific area of memory, enabling DSE of earlier > stores. Or DSE can delete this tie even, if it can see some later store to the same location without anything in between that can read what the tie stores. BLKmode avoids all of this. You can call that elegant, you can call it cheating, you can call it many things -- but it *works*. > AFAIU this special instruction is only supposed to prevent > code motion (of stack memory accesses?) across this instruction? Form rs6000.md: ; This is to explain that changes to the stack pointer should ; not be moved over loads from or stores to stack memory. (define_insn "stack_tie" and from rs6000-logue.cc: /* This ties together stack memory (MEM with an alias set of frame_alias_set) and the change to the stack pointer. */ static void rs6000_emit_stack_tie (rtx fp, bool hard_frame_needed) A big reason this is needed is because of all the hard frame pointer stuff, which the generic parts of GCC require, but there is no register for that in the Power architecture. Nothing is an issue here in most cases, but sometimes we need to do unusual things to the stack, say for alloca. > I'd say a > > (may_clobber (mem:BLK (reg:DI 1 1))) "clobber" always means "may clobber". (clobber X) means X is written with some unspecified value, which may well be whatever value it currently holds. Via some magical means or whatever, there is no mechanism specified, just the effects :-) > might be more to the point? I've used "may_clobber" which doesn't > exist since I'm not sure whether a clobber is considered a kill. > The docs say "Represents the storing or possible storing of an > unpredictable..." - what is it? Storing or possible storing? It is the same thing. "clobber" means the same thing as "set", except the value that is written is not specified. > I suppose stack_tie should be less strict than the documented > (clobber (mem:BLK (const_int 0))) (clobber all memory). "clobber" is nicer than the set to (const_int 0). Does it work though? All this code is always fragile :-/ I'm all for this change, don't get me wrong, but preferably things stay in working order. We use "stack_tie" as a last resort heavy hammer anyway, in all normal cases we explain the actual data flow explicitly and correctly, also between the various registers used in the *logues. Segher
Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie
Hi! On Wed, Jun 14, 2023 at 12:06:29PM +0800, Jiufu Guo wrote: > Segher Boessenkool writes: > I'm also thinking about other solutions: > 1. "set (mem/c:BLK (reg/f:DI 1 1) (const_int 0 [0])" > This is the existing pattern. It may be read as an action > to clean an unknown-size memory block. Including a size zero memory block, yes. BLKmode was originally to do things like bcopy (before modern names like memcpy were more usually used), and those very much need size zero as well. > 2. "set (mem/c:BLK (reg/f:DI 1 1) unspec:blk (const_int 0 [0]) > UNSPEC_TIE". > Current patch is using this one. What would be the semantics of that? Just the same as the current stuff I'd say, or less? It cannot be more! > 3. "set (mem/c:DI (reg/f:DI 1 1) unspec:DI (const_int 0 [0]) > UNSPEC_TIE". >This avoids using BLK on unspec, but using DI. And is incorrect because of that. > 4. "set (mem/c:BLK (reg/f:DI 1 1) unspec (const_int 0 [0]) > UNSPEC_TIE" >There is still a mode for the unspec. It has VOIDmode here, which is incorrect. > > On Tue, Jun 13, 2023 at 08:23:35PM +0800, Jiufu Guo wrote: > >> +&& XINT (SET_SRC (set), 1) == UNSPEC_TIE > >> +&& XVECEXP (SET_SRC (set), 0, 0) == const0_rtx); > > > > This makes it required that the operand of an UNSPEC_TIE unspec is a > > const_int 0. This should be documented somewhere. Ideally you would > > want no operand at all here, but every unspec has an operand. > > Right! Since checked UNSPEC_TIE arleady, we may not need to check > the inner operand. Like " && XINT (SET_SRC (set), 1) == UNSPEC_TIE);". Yes. But we should write down somewhere (in a comment near the unspec constant def for example) what the operand is -- so, "operand is usually (const_int 0) because we have to put *something* there" or such. The clumsiness of this is enough for me to prefer some other solution already ;-) Segher
Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie
Hi! On Wed, Jun 14, 2023 at 05:18:15PM +0800, Xi Ruoyao wrote: > The generic issue here is to fix (not "papering over") the signed > overflow, we need to perform the addition in a target machine mode. We > may always use Pmode (IIRC const_anchor was introduced for optimizing > some constant addresses), but can we do better? The main issue is that the machine description generated target code to compute some constants, but the sanitizer treats it as if it was user code that might do wrong things. > Should we try addition in both DImode and SImode for a 64-bit capable > machine? Why? At least on PowerPC there is only one insn, and it is 64 bits. The SImode version just ignores all bits other than the low 32 bits, in both inputs and output. > Or should we even try more operations than addition (for eg bit > operations like xor or shift)? Doing so will need to create a new > target hook for const anchoring, this is the "complete rework" I meant. This might make const anchor useful for way more targets maybe, including rs6000, yes :-) Segher
Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie
Hi! As I said in a reply to the original patch: not okay. Sorry. But some comments on this patch: On Tue, Jun 13, 2023 at 08:23:35PM +0800, Jiufu Guo wrote: > + && XINT (SET_SRC (set), 1) == UNSPEC_TIE > + && XVECEXP (SET_SRC (set), 0, 0) == const0_rtx); This makes it required that the operand of an UNSPEC_TIE unspec is a const_int 0. This should be documented somewhere. Ideally you would want no operand at all here, but every unspec has an operand. > + RTVEC_ELT (p, i) > + = gen_rtx_SET (mem, gen_rtx_UNSPEC (BLKmode, gen_rtvec (1, const0_rtx), > + UNSPEC_TIE)); If it is hard to indent your code, your code is trying to do to much. Just have an extra temporary? rtx un = gen_rtx_UNSPEC (BLKmode, gen_rtvec (1, const0_rtx), UNSPEC_TIE); RTVEC_ELT (p, i) = gen_rtx_SET (mem, un); That is shorter even, and certainly more readable :-) > @@ -10828,7 +10829,9 @@ (define_expand "restore_stack_block" >operands[4] = gen_frame_mem (Pmode, operands[1]); >p = rtvec_alloc (1); >RTVEC_ELT (p, 0) = gen_rtx_SET (gen_frame_mem (BLKmode, operands[0]), > - const0_rtx); > + gen_rtx_UNSPEC (BLKmode, > + gen_rtvec (1, const0_rtx), > + UNSPEC_TIE)); >operands[5] = gen_rtx_PARALLEL (VOIDmode, p); I have a hard time to see how this could ever be seen as clearer or more obvious or anything like that :-( Segher
Re: [PATCH] rs6000: replace '(const_int 0)' to 'unspec:BLK [(const_int 0)]' for stack_tie
Hi! On Tue, Jun 13, 2023 at 10:15:49AM +0800, Jiufu Guo wrote: > David Edelsohn writes: > > > > This definitely seems to be a better solution. > > > > The TARGET_CONST_ANCHOR change should not be part of this patch. Also > > there is no ChangeLog for the patch. > > Thanks a lot for your quick review!! And sorry for the sending this patch > in a hurry. I would update the patch accordingly. > > This generally looks correct and consistent with other ports. I want > > to give Segher a chance to double check it, if he wishes. The documentation is very clear that the only thing for which you can have BLKmode is "mem". Not unspec, only "mem". Let's not do this. The existing code has clear and obvious semantics, which is documented as well -- there is no reason to make it worse in every respect! Segher
Re: [PATCH v2] rs6000: fmr gets used instead of faster xxlor [PR93571]
Hi! On Sat, Feb 25, 2023 at 03:20:33PM +0530, Ajit Agarwal wrote: > Here is the patch that uses xxlor instead of fmr where possible. > Performance results shows that fmr is better in power9 and > power10 architectures whereas xxlor is better in power7 and > power 8 architectures. fmr is the only option before p7. > ;; The ISA we implement. > -(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p9,p9v,p9kf,p9tf,p10" > +(define_attr "isa" "any,p5,p6,p7,p7v,p8v,p7p8v,p9,p9v,p9kf,p9tf,p10" >(const_string "any")) This isn't really about what insn we *can* use here. > + (and (eq_attr "isa" "p7p8v") > + (match_test "TARGET_VSX && !TARGET_P9_VECTOR")) > + (const_int 1) What is needed here is test the *tune* setting. For example if someone uses -mcpu=power8 -mtune=power9 (this is a setting that is really used, or was a few years ago anyway) you *do* want fmr insns generated. So don't do this via the isa attribute at all, just add some insn condition (testing the tune setting)? Segher
Re: [PATCH V2] Optimize '(X - N * M) / N' to 'X / N - M' if valid
Hi! On Wed, Jun 07, 2023 at 04:21:11PM +0800, Jiufu Guo wrote: > This patch tries to optimize "(X - N * M) / N" to "X / N - M". > For C code, "/" towards zero (trunc_div), and "X - N * M" maybe > wrap/overflow/underflow. So, it is valid that "X - N * M" does > not cross zero and does not wrap/overflow/underflow. Is it ever valid semi-generally when N does not divide X? Say X=5, N=2, M=3. Then the original expression evaluates to 0, but the new one to -1. Whenever one of the divisions rounds up and the other rounds down you have this problem. Segher
[PATCH 2/2] rs6000: genfusion: Delete dead code
2023-06-06 Segher Boessenkool * config/rs6000/genfusion.pl: Delete some dead code. --- gcc/config/rs6000/genfusion.pl | 3 --- 1 file changed, 3 deletions(-) diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl index 2851bb7..82e8f86 100755 --- a/gcc/config/rs6000/genfusion.pl +++ b/gcc/config/rs6000/genfusion.pl @@ -347,6 +347,3 @@ EOF gen_ld_cmpi_p10(); gen_logical_addsubf(); gen_addadd; - -exit(0); - -- 1.8.3.1
[PATCH 1/2] rs6000: genfusion: Rewrite load/compare code
This makes the code more readable, more digestible, more maintainable, more extensible. That kind of thing. It does that by pulling things apart a bit, but also making what stays together more cohesive lumps. The original function was a bunch of loops and early-outs, and then quite a bit of stuff done per iteration, with the iterations essentially independent of each other. This patch moves the stuff done for one iteration to a new _one function. The second big thing is the stuff printed to the .md file is done in "here documents" now, which is a lot more readable than having to quote and escape and double-escape pieces of text. Whitespace inside the here-document is significant (will be printed as-is), which is a bit awkward sometimes, or might take some getting used to, but it is also one of the benefits of using them. Local variables are declared at first use (or close to first use). There also shouldn't be many at all, often you can write easier to read and manage code by omitting to name something that is hard to name in the first place. Finally some things are done in more typical, more modern, and tighter Perl style, for example REs in "if"s or "qw" for lists of constants. 2023-06-06 Segher Boessenkool * config/rs6000/genfusion.pl (gen_ld_cmpi_p10_one): New, rewritten and split out from... (gen_ld_cmpi_p10): ... this. --- gcc/config/rs6000/genfusion.pl | 185 +++-- 1 file changed, 103 insertions(+), 82 deletions(-) diff --git a/gcc/config/rs6000/genfusion.pl b/gcc/config/rs6000/genfusion.pl index e4db352..2851bb7 100755 --- a/gcc/config/rs6000/genfusion.pl +++ b/gcc/config/rs6000/genfusion.pl @@ -53,92 +53,113 @@ sub mode_to_ldst_char return '?'; } +sub gen_ld_cmpi_p10_one +{ + my ($lmode, $result, $ccmode) = @_; + + my $np = "NON_PREFIXED_D"; + my $mempred = "non_update_memory_operand"; + my $extend; + + if ($ccmode eq "CC") { +# ld and lwa are both DS-FORM. +($lmode =~ /^[SD]I$/) and $np = "NON_PREFIXED_DS"; +($lmode =~ /^[SD]I$/) and $mempred = "ds_form_mem_operand"; + } else { +if ($lmode eq "DI") { + # ld is DS-form, but lwz is not. + $np = "NON_PREFIXED_DS"; + $mempred = "ds_form_mem_operand"; +} + } + + my $cmpl = ($ccmode eq "CC") ? "" : "l"; + my $echr = ($ccmode eq "CC") ? "a" : "z"; + if ($lmode eq "DI") { $echr = ""; } + my $constpred = ($ccmode eq "CC") ? "const_m1_to_1_operand" + : "const_0_to_1_operand"; + + # For clobber, we need a SI/DI reg in case we + # split because we have to sign/zero extend. + my $clobbermode = ($lmode =~ /^[QH]I$/) ? "GPR" : $lmode; + if ($result =~ /^EXT/ || $result eq "GPR" || $clobbermode eq "GPR") { +# We always need extension if result > lmode. +$extend = ($ccmode eq "CC") ? "sign" : "zero"; + } else { +# Result of SI/DI does not need sign extension. +$extend = "none"; + } + + my $ldst = mode_to_ldst_char($lmode); + print < lmode. - if ( $ccmode eq 'CC' ) { - $extend = "sign"; - } else { - $extend = "zero"; - } - } else { - # Result of SI/DI does not need sign extension. - $extend = "none"; - } - print ";; load-cmpi fusion pattern generated by gen_ld_cmpi_p10\n"; - print ";; load mode is $lmode result mode is $result compare mode is $ccmode extend is $extend\n"; + foreach my $lmode (qw/DI SI HI QI/) { +foreach my $result ("clobber", $lmode, "EXT$lmode") { + # EXTDI does not exist, and we cannot directly produce HI/QI results. + next if $result =~ /^(QI|HI|EXTDI)$/; - print "(define_insn_and_split \"*l${ldst}${echr}_cmp${cmpl}di_cr0_${lmode}_${result}_${ccmode}_${extend}\"\n"; - print " [(set (match_operand:${ccmode} 2 \"cc_reg_operand\" \"=x\")\n"; - print "(compare:${ccmode} (match_operand:${lmode} 1 \"${mempred}\" \"m\")\n"; - if ($ccmode eq 'CCUNS') { print " "; } - print "(match_operand:${lmode} 3 \"${constpred}\" \"n\")))\n"; - if ($result eq 'clobber') { - print " (clobber (match_scratch:${clobbermode} 0 \"=r\"))]\n"; - } elsif ($result eq $lmode) { - print " (set (match_operand:${result} 0 \"gpc_reg_operand\" \"=r\") (match_dup 1))]\n"; - } else { - p
Re: [PATCH] rs6000: Remove duplicate expression [PR106907]
Hi! On Mon, Jun 05, 2023 at 12:11:42PM +0530, P Jeevitha wrote: > PR106907 has few warnings spotted from cppcheck. In that addressing duplicate > expression issue here. Here the same expression is used twice in logical > AND(&&) operation which result in same result so removing that. > > 2023-06-05 Jeevitha Palanisamy > > gcc/ > PR target/106907 > * config/rs6000/rs6000.cc (vec_const_128bit_to_bytes): Remove > duplicate expression. > > > diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc > index 42f49e4a56b..d197c3f3289 100644 > --- a/gcc/config/rs6000/rs6000.cc > +++ b/gcc/config/rs6000/rs6000.cc > @@ -28784,7 +28784,6 @@ vec_const_128bit_to_bytes (rtx op, > >info->all_words_same > = (info->words[0] == info->words[1] > - && info->words[0] == info->words[1] > && info->words[0] == info->words[2] > && info->words[0] == info->words[3]); Thanks! Okay for trunk. Also okay for all backports, no need to wait if unexpected problems in trunk show up. But still, backport to 13 first, then 12, then 11, only stop when it stops applying (or there are no open release branches left) :-) Segher
Re: [PATCH V5, 2/2] PR target/105325: Fix memory constraints for power10 fusion.
On Wed, May 10, 2023 at 11:40:00AM -0400, Michael Meissner wrote: > This patch applies stricter predicates and constraints for LD and LWA > instructions with power10 fusion. These instructions are DS-form > instructions, > which means that the bottom 2 bits of the address must be 0. The low two bits of the offset, yes. > --- a/gcc/config/rs6000/genfusion.pl > +++ b/gcc/config/rs6000/genfusion.pl > @@ -129,6 +129,12 @@ sub print_ld_cmpi_p10 >print " \"\"\n"; >print " [(set_attr \"type\" \"fused_load_cmpi\")\n"; >print " (set_attr \"cost\" \"8\")\n"; > + > + if ($extend eq "sign") > +{ > + print " (set_attr \"sign_extend\" \"yes\")\n"; > +} You never ever need backslashes like this in Perl code, btw. For example: print qq{ (set_attr "sign_extend" "yes")\n}; or print qq/ (set_attr "sign_extend" "yes")\n/; or print <<"HERE" (set_attr "sign_extend" "yes") HERE or millions of other ways, all of which are much nicer than cramped code that tries to look like C (but has very different semantics in all ways that matter). (Also zillions of ways that are worse still, but that is the price of freedom maybe :-) ) > - # Memory predicate to use. > + # Memory predicate to use. For LWA, use the special LWA_OPERAND. Explain *why*? It is obvious *what*! Maybe just split the series into more patches? > @@ -0,0 +1,26 @@ > +/* { dg-do assemble } */ > +/* { dg-require-effective-target lp64 } */ > +/* { dg-require-effective-target power10_ok } */ power10_ok should no longer exist, btw. Technical debt has to be repaid :-/ This patch is readable btw. Thanks :-) Segher
Re: [PATCH V5, 1/2] PR target/105325: Rewrite genfusion.pl's gen_ld_cmpi_p10 function.
Hi Mike, On Wed, May 10, 2023 at 11:38:55AM -0400, Michael Meissner wrote: > This patch rewrites the gen_ld_cmpi_p10 function in genfusion.pl to be > clearer. That is not at all what I asked for, even if I would agree the code is nicer to read now (I don't). What I asked for, what is needed, is for your patches to be readable. This is a prerequisite for them to be reviewable, which is a prerequisite for them to be approvable. One way to do that is to split out refactorings (which I asked for) and rewrites (which you did) to earlier patches in the series. Pure refactoring are easy to review: they change exactly nothing in what code is executed. Rewrites are much harder to review. But even then we can hope you didn't slip up once in a hundred lines of code, sure. The later patches can then be much more readable because there isn't so much noise mixed in. > Assuming I can check this in, I will > also commit to the active GCC branches after a burn-in period. No, you will never do that. You always need approval for that. We have these procedures for a reason. We do not want other things than what was approved committed, doubly so if *nothing* was approved. > * config/rs6000/genfusion.pl (mode_to_ldst_char): Delete. This is a regression. > +# Print the insns for load and compare with -1/0/1. > +# Arguments: > +# lmode -- Integer mode ("DI", "SI", "HI", or "QI"). > +# result -- "clobber", "GPR", or $lmode > +# ccmode -- Sign vs. unsigned ("CC" or "CCUNS"). > +# mem_format -- Memory format ("d" or "ds"). > +# cmpl -- Suffix for compare ("l" or "") > +# const_pred -- Predicate for constant (i.e. -1/0/1 or 0/1). > +# extend -- "sign", "zero", or "none". > +# echr -- Suffix for load ("a", "z", or ""). > +# load -- Load instruction (i.e. "ld", "lwa", "lwz", etc.) > +# np -- enum non_prefixed_form for memory type > +# constraint -- constraint to use > +# mem_pred -- predicate for the memory operation If you need a huge block comment for your sub argument, that is a not-so-subtle hint that you need to refactor. Or if this was supposed to be a refactoring, that something went terribly wrong :-( Segher
Re: [PATCH] Only use NO_REGS in cost calculation when !hard_regno_mode_ok for GENERAL_REGS and mode.
On Thu, May 25, 2023 at 10:29:47AM -0400, Vladimir Makarov wrote: > > On 5/17/23 02:57, liuhongt wrote: > >r14-172-g0368d169492017 replaces GENERAL_REGS with NO_REGS in cost > >calculation when the preferred register class are not known yet. > >It regressed powerpc PR109610 and PR109858, it looks too aggressive to use > >NO_REGS when mode can be allocated with GENERAL_REGS. > >The patch takes a step back, still use GENERAL_REGS when > >hard_regno_mode_ok for mode and GENERAL_REGS, otherwise uses NO_REGS. > >Kewen confirmed the patch fixed PR109858, I vefiried it also fixed > >PR109610. > > > >Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. > >No big performance impact for SPEC2017 on icelake server. > >Ok for trunk? > > > >gcc/ChangeLog: > > > > * ira-costs.cc (scan_one_insn): Only use NO_REGS in cost > > calculation when !hard_regno_mode_ok for GENERAL_REGS and > > mode, otherwise still use GENERAL_REGS. > > Thank you for the patch. It looks good for me. It is ok to commit it > into the trunk. Thanks everyone involved for fixing this nasty regression! Much appreciated. Segher
Re: [PATCH] [testsuite] [powerpc] adjust -m32 counts for fold-vec-extract*
Hi Alex, On Thu, May 25, 2023 at 10:55:37AM -0300, Alexandre Oliva wrote: > On May 25, 2023, Segher Boessenkool wrote: > > Fwiw, updating the insn counts blindly like this > > ... is a claim that carries a wildly incorrect and insulting underlying > assumption: Sorry you feel that way. I'm not even assuming anything :-( > I've actually identified the corresponding change to the > lp64 tests, compared the effects of the codegen changes, and concluded > the tests needed this changing for ilp32 to keep on testing for the same > thing after code changes brought about by changes that AFAICT had been > well understood when making the lp64 adjustments. But you didn't explain any of that (saying it is so is not the same thing at all as explaining it!) > > If it is not possible to keep these tests up-to-date easily > > The counts have been stable for a couple of release cycles already. > > The change that caused the codegen differences is identified and > understood; the PR confirmed my findings, naming the root cause and the > incomplete testsuite adjustment. Oh, was this discussed in some PR? The patch submission should have carried the conclusions from the discussions there then :-) > I suspect there may also be ABI-related assumptions implied by the 'add' > counts, but I don't know enough about all the ppc variants to be sure. The compiler can and will create all kinds of code for wildly unexpected reasons. "add" is dangerous to count already, but it is not as bad as "addi" :-) > Now, if your implied claim is correct that counting 'add/addi' > instructions in these tests is fragile, dropping the checks for those > would probably be best. The same is true for almost all instructions. You can only sanely count instructions if either you count only unusual insns, or if you test only *tiny* functions (say five insns, including the blr at the end!) > But if ppc maintainers seem to have different > opinions as to how to deal with the fallout of that one-time codegen > change, it would be foolish for me to get pulled into the cross fire. There is no crossfire. I did not dis-approve the patch, just said this is a high maintenance direction to proceed in. There has been a lot of that the last few years, we should improve on that. It is not about this patch (only). > Here's the patch that corrects the long-broken counts, with the > requested adjustments, retested with ppc- and ppc64-vx7r2. Ok? > Codegen changes caused add instruction count mismatches on > ppc-*-linux-gnu and other 32-bit ppc targets. At some point the > expected counts were adjusted for lp64, but ilp32 differences > remained, and published test results confirm it. ... and this is not something that can be confimed like this. Just spend a few minutes more to put *actual numbers* here, with some indication this is good and correct codegen, so that it is bloody easy for a reviewer to review and for a maintainer to approve! > /* -m32 target has an 'add' in place of one of the 'addi'. */ > -/* { dg-final { scan-assembler-times {\maddi\M|\madd\M} 2 { target lp64 } } > } */ > -/* { dg-final { scan-assembler-times {\maddi\M|\madd\M} 3 { target ilp32 } } > } */ > +/* { dg-final { scan-assembler-times {\maddi\M|\madd\M} 2 } } */ Just {\madd} or more conservative {\maddi?\M} then? Segher
Re: [PATCH] [testsuite] [powerpc] adjust -m32 counts for fold-vec-extract*
Hi! On Thu, May 25, 2023 at 07:05:55AM -0300, Alexandre Oliva wrote: > On May 25, 2023, "Kewen.Lin" wrote: > > So both lp64 and ilp32 have the same count, could we merge it and > > remove the selectors? > > We could, but... I thought I wouldn't, since they were different > before, and they're likely to diverge again in the future. I thought > that combining them might suggest that they ought to be the same, when > we already know that this is not the case. > > I'll prepare an alternate patch that combines them. Fwiw, updating the insn counts blindly like this has very small value on the one hand, and negative value on the other. In total, negative value. If it is not possible to keep these tests up-to-date easily the test should be improved. If tests regressed otoh we should ***not*** paper over that with patches like this, but investigate what happened instead: such regressions are *real*. So which is it here? I am assuming it is a not-to-well written testcase without all the necessary noipa attrs, and/or putting more than one thing to test per function directly. Insn counts then shift easily if the compiler decides to factor (CSE etc.) your code differently, but that is a testcase artifact then, not something we want to adjust counts for all of the time. It is feasible to do these insn count things only for trivial tiny snippets. Everything bigger will regress all of the time, no one will look at it properly, and instead people will just do blind "update counts" patches like this :-/ *Good* insn count tests are quite valuable, but harder to write. But maintenance costs noticably bigger than zero for a testcase are not good, how many testcases do we run in the testsuite? So, can we fix the underlying problem here please? Thanks, Segher
Re: [PATCH v1] tree-ssa-sink: Improve code sinking pass.
Hi! On Thu, May 18, 2023 at 12:44:28PM +0530, Ajit Agarwal wrote: > This patch improves code sinking pass to sink statements before call to reduce > register pressure. An example would be useful :-) > * tree-ssa-sink.cc (statement_sink_location): Modifed to > move statements before calls. Spello ("modified"). But, you should write in the imperative mood anyway, so "modify". But, every change is a modification, so do without the fluff altogether? "Move statements before calls." > (block_call_p): New function. > (def_use_same_block): New function. > (select_best_block): Add heuristics to select the best > blocks in the immediate post dominator. Please don't break lines early it makes things harder to read. :-) > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c > @@ -0,0 +1,16 @@ > +/* { dg-do compile } */ This is the default, you can just leave it out. > +/* { dg-options "-O2 -fdump-tree-sink -fdump-tree-optimized > -fdump-tree-sink-stats" } */ You don't need -fdump-tree-sink without options since you have -fdump-tree-sink-stats as well. > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c > @@ -0,0 +1,20 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-sink-stats -fdump-tree-sink-stats" } */ You don't need to say it twice either :-) > +/* Return TRUE if immediate uses of the defs in > + USE occur in the same block as USE, FALSE otherwise. */ > + > +bool > +def_use_same_block (gimple *stmt) > +{ There is no function parameter "use" here? STMT instead? > + use_operand_p use_p; > + def_operand_p def_p; Neither of these is a predicate. Lose the _p please? > + if (use_p > + && (gimple_bb (USE_STMT (use_p)) == gimple_bb (stmt))) Please fit this on one line. And no parens around random things please. > +/* Return TRUE if the block has only calls, FALSE otherwise. */ > + > +bool > +block_call_p (basic_block bb) > +/* We have already seen a call. */ > +if (is_call) > + return false; > + > +if (is_gimple_call (stmt)) > + is_call = true; > +else > + return false; > + if (is_call && i == 1) > +return true; > + > + return false; This doesn't do what the function comment says? It is very important that function comments say exactly what a function does. It can perhaps leave out some details, but it should be correct by and large. > + /* Update sinking point as stmt before call if the sinking block > +has only calls. Otherwise update sinking point as the use > +stmt. */ (two spaces after full stop, twice) > + if (gsi_stmt (gsi) == use > + && !is_gimple_call (last_stmt) > + && (gimple_code (last_stmt) != GIMPLE_SWITCH) > + && (gimple_code (last_stmt) != GIMPLE_COND) > + && (gimple_code (last_stmt) != GIMPLE_GOTO) > + && (!gimple_vdef (use) || !def_use_same_block (def_stmt))) Please no unnecessary parens. At first I didn't notice the last line here *does* need it! Segher
Re: [PATCH v5 1/4] rs6000: Enable REE pass by default
Hi! On Tue, May 16, 2023 at 11:45:28AM +0530, Ajit Agarwal wrote: > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi > @@ -12455,8 +12455,8 @@ Attempt to remove redundant extension instructions. > This is especially > helpful for the x86-64 architecture, which implicitly zero-extends in 64-bit > registers after writing to their lower 32-bit half. > > -Enabled for Alpha, AArch64 and x86 at levels @option{-O2}, > -@option{-O3}, @option{-Os}. > +Enabled for Alpha, AArch64, RS/6000, RISC-V, SPARC, h83000 and x86 at levels > +@option{-O2}, @option{-O3}, @option{-Os}. Please don't mention RS/6000, we don't support that anymore. The architecture we do support is called Power or PowerPC; the target triplets are powerpc*-*-*. rs6000-*-* might still somewhat work, but no one should use it anymore, and we probably should delete it. Please say PowerPC here. With that the patch is okay for trunk. Thank you! Segher
Re: [PATCH] [powerpc] Add a peephole2 to eliminate redundant move from VSX_REGS to GENERAL_REGS when it's from memory.
On Thu, May 04, 2023 at 01:54:46PM +0800, liuhongt wrote: > r14-172-g0368d169492017 use NO_REGS instead of GENERAL_REGS in memory cost > calculation when preferred register class is unkown. > + /* Costs for NO_REGS are used in cost calculation on the > +1st pass when the preferred register classes are not > +known yet. In this case we take the best scenario. */ > > It regressed gcc.target/powerpc/dform-3.c which has inline asm explicitly > put a vector mode into a general register, then create an extra move. > RA doesn't allocate GENERAL_REGS for it because the backend pattern > explicitly disparage the alternative (, r), (??r, Y) which moves > from GENERAL_REGS/MEM to GENERAL_REGS. And that is correct and wanted. > Normally the extra move can be eliminated by pass_reload Which is completely beside the point: reload is not in the business of making good code. Instead, it should make reasonable code when the good code we attempted to make did not work out. Think of it like a last resort register allocation fixup. > The patch adds a peephole2 to eliminate the extra move. Nope. Not okay. This is not what peepholes are for *at all*, and this path leads to insanity and millionfold maintenance cost. Peepholes are *only*, I repeat *only*, for situations where we did a *correct* cost estimation but due to some target detail we want to fine-tune the generated code a bit. If you want a peephole you most likely have a fundamental cost function problem elsewhere. Fix *that*, that is actually useful, and won't get you into hotter water. > Ok for trunk? Not okay, no. Please fix the actual bug? Revert the previous patch, for example :-( > +;; Peephole to catch memory loads to VSX_REG and then moves to GENERAL_REGS. > +(define_peephole2 > + [(set (match_operand:VSX_M 0 "vsx_register_operand") > + (match_operand:VSX_M 1 "memory_operand")) > + (set (match_operand:VSX_M 2 "int_reg_operand") > + (match_dup 0))] > + "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (mode) > + && peep2_reg_dead_p (2, operands[0])" > + [(set (match_dup 2) (match_dup 1))]) The condition does not make sense, even assuming the peephole does (it does not). Why would you care if the compiler is allowed to generate 64-bit insns here? The formatting is messed up as well. Segher
Re: [PATCH, V4] PR target/105325, Make load/cmp fusion know about prefixed loads.
On Wed, Apr 26, 2023 at 12:18:36PM -0400, Michael Meissner wrote: > * gcc/config/rs6000/genfusion.pl (gen_ld_cmpi_p10): Improve generation > of the ld and lwa instructions which use the DS encoding instead of D. > Use the YZ constraint for these loads. Handle prefixed loads better. Don't use tabs in the middle of a line. "Handle prefixed loads better" is not what the patch does, and/or is so vague as to be useless. > --- a/gcc/config/rs6000/genfusion.pl > +++ b/gcc/config/rs6000/genfusion.pl > @@ -56,7 +56,7 @@ sub mode_to_ldst_char > sub gen_ld_cmpi_p10 > { > my ($lmode, $ldst, $clobbermode, $result, $cmpl, $echr, $constpred, > - $mempred, $ccmode, $np, $extend, $resultmode); > + $mempred, $ccmode, $np, $extend, $resultmode, $constraint); >LMODE: foreach $lmode ('DI','SI','HI','QI') { >$ldst = mode_to_ldst_char($lmode); >$clobbermode = $lmode; > @@ -71,21 +71,34 @@ sub gen_ld_cmpi_p10 >CCMODE: foreach $ccmode ('CC','CCUNS') { > $np = "NON_PREFIXED_D"; > $mempred = "non_update_memory_operand"; > + $constraint = "m"; > if ( $ccmode eq 'CC' ) { > next CCMODE if $lmode eq 'QI'; > - if ( $lmode eq 'DI' || $lmode eq 'SI' ) { > + if ( $lmode eq 'HI' ) { > + $np = "NON_PREFIXED_D"; > + $mempred = "non_update_memory_operand"; > + $echr = "a"; > + } elsif ( $lmode eq 'SI' ) { > + # ld and lwa are both DS-FORM. > + $np = "NON_PREFIXED_DS"; > + $mempred = "lwa_operand"; > + $echr = "a"; > + $constraint = "YZ"; > + } elsif ( $lmode eq 'DI' ) { > # ld and lwa are both DS-FORM. > $np = "NON_PREFIXED_DS"; > $mempred = "ds_form_mem_operand"; > + $echr = ""; > + $constraint = "YZ"; > } > $cmpl = ""; > - $echr = "a"; > $constpred = "const_m1_to_1_operand"; > } else { > if ( $lmode eq 'DI' ) { > # ld is DS-form, but lwz is not. > $np = "NON_PREFIXED_DS"; > $mempred = "ds_form_mem_operand"; > + $constraint = "YZ"; > } > $cmpl = "l"; > $echr = "z"; > @@ -108,7 +121,7 @@ sub gen_ld_cmpi_p10 > > print "(define_insn_and_split > \"*l${ldst}${echr}_cmp${cmpl}di_cr0_${lmode}_${result}_${ccmode}_${extend}\"\n"; > print " [(set (match_operand:${ccmode} 2 \"cc_reg_operand\" > \"=x\")\n"; > - print "(compare:${ccmode} (match_operand:${lmode} 1 > \"${mempred}\" \"m\")\n"; > + print "(compare:${ccmode} (match_operand:${lmode} 1 > \"${mempred}\" \"${constraint}\")\n"; > if ($ccmode eq 'CCUNS') { print " "; } > print "(match_operand:${lmode} 3 \"${constpred}\" > \"n\")))\n"; > if ($result eq 'clobber') { > @@ -137,6 +150,11 @@ sub gen_ld_cmpi_p10 > print " \"\"\n"; > print " [(set_attr \"type\" \"fused_load_cmpi\")\n"; > print " (set_attr \"cost\" \"8\")\n"; > + > + if ($extend eq "sign") { > + print " (set_attr \"sign_extend\" \"yes\")\n"; > + } > + > print " (set_attr \"length\" \"8\")])\n"; > print "\n"; >} This already was a 90-line function that did too many things. Now it is bigger and does more things, and the patch is unintelligible. Please first factor things. There are many more things terrible Perl code style here (like all of the quoting), but where to start :-/ I once again spent many hours trying to review this, and once again failed. Please write better code, and please make better patches. > index ec783803820..7d6c94aee5b 100644 > --- a/gcc/config/rs6000/rs6000.md > +++ b/gcc/config/rs6000/rs6000.md > @@ -302,7 +302,7 @@ (define_attr "prefixed" "no,yes" > (eq_attr "maybe_prefixed" "no")) >(const_string "no") > > - (eq_attr "type" "load,fpload,vecload") > + (eq_attr "type" "load,fpload,vecload,vecload,fused_load_cmpi") Don't duplicate vecload. > --- /dev/null > +++ b/gcc/testsuite/g++.target/powerpc/pr105325.C > @@ -0,0 +1,25 @@ > +/* { dg-do assemble } */ > +/* { dg-require-effective-target lp64 } */ > +/* { dg-require-effective-target power10_ok } */ > +/* { dg-require-effective-target powerpc_prefixed_addr } */ > +/* { dg-options "-O2 -mdejagnu-cpu=power10 -fstack-protector" } */ The power10_ok selector still is terribly broken (it allows only some variants of 64-bit Linux and nothing more, to start with). Do we still need it in any case? Same for powerpc_prefixed_addr. Is there any supported target that does not have a working assembler? What is -fstack-protector here for? That should be documented, or better, it should just be removed if possible. > -/* { dg-final { scan-assembler-times