Re: [PATCH] rs6000: Don't clobber return value when eh_return called [PR114846]

2024-05-15 Thread Andrew Pinski
On Thu, May 16, 2024, 4:09 AM Kewen.Lin wrote: > Hi, > > As the associated test case in PR114846 shows, currently > with eh_return involved some register restoring for EH > RETURN DATA in epilogue can clobber the one which holding > the return value. Referring to the existing handlings in >

RE: [PATCH v4] DSE: Fix ICE after allow vector type in get_stored_val

2024-05-15 Thread Li, Pan2
Kindly ping, looks no build error from Linaro for arm. Pan -Original Message- From: Li, Pan2 Sent: Friday, May 3, 2024 9:52 AM To: gcc-patches@gcc.gnu.org Cc: jeffreya...@gmail.com; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Liu, Hongtao ; richard.guent...@gmail.com; Li, Pan2

[PATCH v2 2/3] RISC-V: Implement vectorizable early exit with vcond_mask_len

2024-05-15 Thread pan2 . li
From: Pan Li After we support the loop lens for the vectorizable, we would like to implement the feature for the RISC-V target. Given below example: unsigned vect_a[1923]; unsigned vect_b[1923]; void test (unsigned limit, int n) { for (int i = 0; i < n; i++) { vect_b[i] = limit +

[PATCH v2 3/3] RISC-V: Enable vectorizable early exit testsuite

2024-05-15 Thread pan2 . li
From: Pan Li After we supported vectorizable early exit in RISC-V, we would like to enable the gcc vect test for vectorizable early test. The vect-early-break_124-pr114403.c failed to vectorize for now. Because that the __builtin_memcpy with 8 bytes failed to folded into int64 assignment

[PATCH v2 1/3] Vect: Support loop len in vectorizable early exit

2024-05-15 Thread pan2 . li
From: Pan Li This patch adds early break auto-vectorization support for target which use length on partial vectorization. Consider this following example: unsigned vect_a[802]; unsigned vect_b[802]; void test (unsigned x, int n) { for (int i = 0; i < n; i++) { vect_b[i] = x + i;

RE: [PATCH 0/4]AArch64: support conditional early clobbers on certain operations.

2024-05-15 Thread Tamar Christina
> -Original Message- > From: Richard Sandiford > Sent: Wednesday, May 15, 2024 10:31 PM > To: Tamar Christina > Cc: Richard Biener ; gcc-patches@gcc.gnu.org; nd > ; Richard Earnshaw ; Marcus > Shawcroft ; ktkac...@gcc.gnu.org > Subject: Re: [PATCH 0/4]AArch64: support conditional early

RE: [PATCH 1/5] RISC-V: Remove float vector eqne pattern

2024-05-15 Thread Demin Han
Hi Juzhe, There are two eqne pattern removal patches, one for float, another for integer. https://patchwork.sourceware.org/project/gcc/patch/20240301062711.207137-5-demin@starfivetech.com/ https://patchwork.sourceware.org/project/gcc/patch/20240301062711.207137-2-demin@starfivetech.com/

[PATCH] rs6000: Don't clobber return value when eh_return called [PR114846]

2024-05-15 Thread Kewen.Lin
Hi, As the associated test case in PR114846 shows, currently with eh_return involved some register restoring for EH RETURN DATA in epilogue can clobber the one which holding the return value. Referring to the existing handlings in some other targets, this patch makes eh_return expander call one

Re: [PATCH 1/5] RISC-V: Remove float vector eqne pattern

2024-05-15 Thread ??????
Would you minding sending this patch again?? I can not find the patch now. --Reply to Message-- On Thu, May 16, 2024 03:48 AM Robin Dapp

RE: [PATCH 1/5] RISC-V: Remove float vector eqne pattern

2024-05-15 Thread Demin Han
Hi Robin, Yes. Can eqne pattern removal patches be committed firstly? Regards, Demin > -Original Message- > From: Robin Dapp > Sent: 2024年5月16日 3:49 > To: Demin Han ; 钟居哲 > ; gcc-patches > Cc: rdapp@gmail.com; kito.cheng ; Li, Pan2 > ; jeffreyalaw > Subject: Re: [PATCH 1/5]

[PATCH] RISC-V: Fix "Nan-box the result of movbf on soft-bf16"

2024-05-15 Thread Xiao Zeng
1 According to unpriv-isa spec: 1.1 "FMV.H.X moves the half-precision value encoded in IEEE 754-2008 standard encoding from the lower 16 bits of integer register rs1 to

[pushed] diagnostics: use unicode art for interprocedural depth

2024-05-15 Thread David Malcolm
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu. Successful run of analyzer integration tests on x86_64-pc-linux-gnu. Pushed to trunk as r15-535-ge656656e711949. gcc/testsuite/ChangeLog: * gcc.dg/analyzer/out-of-bounds-diagram-1-emoji.c: Update expected output to use

[pushed] diagnostics: add warning emoji to events with VERB_danger

2024-05-15 Thread David Malcolm
Tweak the printing of -fdiagnostics-path-format=inline-events so that any event with diagnostic_event::VERB_danger gains a warning emoji, provided that the text art theme enables emoji support. VERB_danger is set by the analyzer on the last event in a path, and so this emoji appears at the end of

[pushed] diagnostics: simplify output of purely intraprocedural execution paths

2024-05-15 Thread David Malcolm
Diagnostic path printing was added in r10-5901-g4bc1899b2e883f. As of that commit, with -fdiagnostics-path-format=inline-events (the default), we print a vertical line to the left of the source line numbering, visualizing the stack depth and interprocedural calls and returns as indentation

[pushed] diagnostics: handle SGR codes in line_label::m_display_width

2024-05-15 Thread David Malcolm
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu. Successful run of analyzer integration tests on x86_64-pc-linux-gnu. Pushed to trunk as r15-532-ga7be993806a90a. gcc/ChangeLog: * diagnostic-show-locus.cc: Define INCLUDE_VECTOR and include "text-art/types.h".

[COMMITTED] RISC-V: Add Zvfbfwma extension to the -march= option

2024-05-15 Thread Xiao Zeng
2024-05-15 13:48  Kito Cheng wrote: > >LGTM, I agree we should only implement what Embedded Processor >implies, we have no way to know that from the arch string Thanks, Kito. 1 Passed CI testing, except for formatting issues.

[pushed] analyzer: fix ICE seen with -fsanitize=undefined [PR114899]

2024-05-15 Thread David Malcolm
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu. Pushed to trunk as r15-526-g1779e22150b917. gcc/analyzer/ChangeLog: PR analyzer/114899 * access-diagram.cc (written_svalue_spatial_item::get_label_string): Bulletproof against SSA_NAME_VAR being null.

Re: [PATCH 0/4]AArch64: support conditional early clobbers on certain operations.

2024-05-15 Thread Richard Sandiford
Tamar Christina writes: >> >> On Wed, May 15, 2024 at 12:29 PM Tamar Christina >> >> wrote: >> >> > >> >> > Hi All, >> >> > >> >> > Some Neoverse Software Optimization Guides (SWoG) have a clause that >> >> > state >> >> > that for predicated operations that also produce a predicate it is >>

Re: [PATCH v2 1/2] RISC-V: Add cmpmemsi expansion

2024-05-15 Thread Jeff Law
On 5/15/24 12:49 AM, Christoph Müllner wrote: GCC has a generic cmpmemsi expansion via the by-pieces framework, which shows some room for target-specific optimizations. E.g. for comparing two aligned memory blocks of 15 bytes we get the following sequence: my_mem_cmp_aligned_15: li

Re: [PATCH] RISC-V: propgue/epilogue expansion code minor changes [NFC]

2024-05-15 Thread Vineet Gupta
On 5/15/24 12:32, Jeff Law wrote: > > On 5/15/24 12:55 PM, Vineet Gupta wrote: >> Saw this little room for improvement in current debugging of >> prologue/epilogue expansion code. >> >> --- >> >> Use the following pattern consistently >> `RTX_FRAME_RELATED_P (gen_insn (insn)) = 1` >> >>

Re: [PATCH 1/2] RISC-V: Add tests for cpymemsi expansion

2024-05-15 Thread Patrick O'Neill
On 5/14/24 22:00, Christoph Müllner wrote: On Fri, May 10, 2024 at 6:01 AM Patrick O'Neill wrote: Hi Christoph, cpymemsi-1.c fails on a subset of newlib targets. "UNRESOLVED: gcc.target/riscv/cpymemsi-1.c -O0 compilation failed to produce executable" Full list of failing targets here

[PATCH] libstdc++: Avoid MMX return types from __builtin_shufflevector

2024-05-15 Thread Matthias Kretz
Tested on aarch64-linux-gnu, arm-linux-gnueabihf, powerpc64le-linux-gnu, x86_64-linux-gnu (-m64, -m32, -mx32), and arm-linux-gnueabi OK for trunk? And when backporting, should I squash it with the commit that introduced the regression? 8< --- This resolves

Re: [PATCH 1/5] RISC-V: Remove float vector eqne pattern

2024-05-15 Thread Robin Dapp
Hi Demin, are you still going to continue with this? Regards Robin

Re: [PATCH] RISC-V: propgue/epilogue expansion code minor changes [NFC]

2024-05-15 Thread Jeff Law
On 5/15/24 12:55 PM, Vineet Gupta wrote: Saw this little room for improvement in current debugging of prologue/epilogue expansion code. --- Use the following pattern consistently `RTX_FRAME_RELATED_P (gen_insn (insn)) = 1` vs. calling gen_insn around apriori gen_xxx_insn () calls.

Re: [PATCH] RISC-V: Do not allow v0 as dest when merging [PR115068].

2024-05-15 Thread Robin Dapp
> I saw vwadd/vwsub.wx have same issue. Could you change them and add test too ? Yes, will do. At first I didn't manage to reproduce it because we seem to be lacking a combine-opt pattern for it. I'm going to post it separately. Regards Robin

[PATCH] RISC-V: propgue/epilogue expansion code minor changes [NFC]

2024-05-15 Thread Vineet Gupta
Saw this little room for improvement in current debugging of prologue/epilogue expansion code. --- Use the following pattern consistently `RTX_FRAME_RELATED_P (gen_insn (insn)) = 1` vs. calling gen_insn around apriori gen_xxx_insn () calls. This reduces weird indentations which are

[PATCH] MIPS: Remove -m(no-)lra option

2024-05-15 Thread YunQiang Su
PR target/113955 The `-mlra` option was introduced in 2014 for MIPS, and was set to default since then. It's time for us to drop no-lra support by dropping -m(no-)lra options. gcc: * config/mips/mips.cc(mips_option_override): Drop mips_lra_flag variable; (mips_lra_p):

[PATCH] c++: represent all class non-dep assignments as CALL_EXPR

2024-05-15 Thread Patrick Palka
Bootstrapped and regtested on x86_64-pc-linu-xgnu, does this look OK for trunk? -- >8 -- Non-dependent compound assignment expressions are currently represented as CALL_EXPR to the selected operator@= overload. Non-dependent simple assignments on the other hand are still represented as

[r15-512 Regression] FAIL: gfortran.dg/vect/vect-do-concurrent-1.f90 -O at line 14 (test for warnings, line ) on Linux/x86_64

2024-05-15 Thread haochen.jiang
On Linux/x86_64, 9b7cad5884f21cc5783075be0043777448db3fab is the first bad commit commit 9b7cad5884f21cc5783075be0043777448db3fab Author: Jan Hubicka Date: Wed May 15 14:14:27 2024 +0200 Avoid pointer compares on TYPE_MAIN_VARIANT in TBAA caused FAIL: gcc.dg/tree-ssa/ssa-lim-15.c

Re: Fix gnu versioned namespace mode 00/03

2024-05-15 Thread François Dumont
On 13/05/2024 10:34, Jonathan Wakely wrote: On Mon, 13 May 2024, 07:30 Iain Sandoe, wrote: > On 13 May 2024, at 06:06, François Dumont wrote: > > > On 07/05/2024 18:15, Iain Sandoe wrote: >> Hi François >> >>> On 4 May 2024, at 22:11, François Dumont

[to-be-committed][RISC-V] Improve some shift-add sequences

2024-05-15 Thread Jeff Law
So this is a minor fix/improvement for shift-add sequences. This was supposed to help xz in a minor way IIRC. Combine may present us with (x + C2') << C1 which was canonicalized from (x << C1) + C2. Depending on the precise values of C2 and C2' one form may be better than the other. We

[PATCH v4] c++: fix constained auto deduction in templ spec scopes [PR114915]

2024-05-15 Thread Seyed Sajad Kahani
This patch resolves PR114915 by replacing the logic that fills in the missing levels in do_auto_deduction in cp/pt.cc. The new approach now trims targs if the depth of targs is deeper than desired (this will only happen in specific contexts), and still fills targs with empty layers if it has fewer

[PATCH] tree-optimization/79958 - make DSE track multiple paths

2024-05-15 Thread Richard Biener
DSE currently gives up when the path we analyze forks. This leads to multiple missed dead store elimination PRs. The following fixes this by recursing for each path and maintaining the visited bitmap to avoid visiting CFG re-merges multiple times. The overall cost is still limited by the same

[Patch, fortran] PR114874 - [14/15 Regression] ICE with select type, type is (character(*)), and substring

2024-05-15 Thread Paul Richard Thomas
Hi All, I have been around several circuits with a patch for this regression. I posted one in Bugzilla but rejected it because it was not direct enough. This one, however, is more to my liking and fixes another bug lurking in the shadows. The way in which select type has been implemented is a

[committed] openmp: Diagnose using grainsize+num_tasks clauses together [PR115103]

2024-05-15 Thread Jakub Jelinek
Hi! I've noticed that while we diagnose many other OpenMP exclusive clauses, we don't diagnose grainsize together with num_tasks on taskloop construct in all of C, C++ and Fortran (the implementation simply ignored grainsize in that case) and for Fortran also don't diagnose mixing nogroup clause

[committed] combine: Fix up simplify_compare_const [PR115092]

2024-05-15 Thread Jakub Jelinek
Hi! The following testcases are miscompiled (with tons of GIMPLE optimization disabled) because combine sees GE comparison of 1-bit sign_extract (i.e. something with [-1, 0] value range) with (const_int -1) (which is always true) and optimizes it into NE comparison of 1-bit zero_extract ([0, 1]

Re: [PATCH] middle-end/111422 - wrong stack var coalescing, handle PHIs

2024-05-15 Thread Richard Biener
On Wed, 15 May 2024, Jakub Jelinek wrote: > On Wed, May 15, 2024 at 01:41:04PM +0200, Richard Biener wrote: > > PR middle-end/111422 > > * cfgexpand.cc (add_scope_conflicts_2): Handle PHIs > > by recursing to their arguments. > > --- > > gcc/cfgexpand.cc | 21 + >

[COMMITTED] Regenerate cygming.opt.urls and mingw.opt.urls

2024-05-15 Thread Evgeny Karpov
Monday, May 13, 2024 3:49 PM wrote: David Malcolm wrote: > > > > It might be a "make" dependencies issue: > > "make regenerate-opt-urls" has dependencies on OPT_URLS_HTML_DEPS > > which is currently defined as: > > OPT_URLS_HTML_DEPS = $(build_htmldir)/gcc/Option-Index.html \ > >

RE: [PATCH 0/4]AArch64: support conditional early clobbers on certain operations.

2024-05-15 Thread Tamar Christina
> >> On Wed, May 15, 2024 at 12:29 PM Tamar Christina > >> wrote: > >> > > >> > Hi All, > >> > > >> > Some Neoverse Software Optimization Guides (SWoG) have a clause that > >> > state > >> > that for predicated operations that also produce a predicate it is > >> > preferred > >> > that the

Re: [PATCH v3] c++: Fix auto deduction for template specialization scopes [PR114915]

2024-05-15 Thread Patrick Palka
On Wed, 15 May 2024, Patrick Palka wrote: > > On Fri, 10 May 2024, Seyed Sajad Kahani wrote: > > > This patch resolves PR114915 by replacing the logic that fills in the > > missing levels in do_auto_deduction in cp/pt.cc. > > The new approach now trims targs if the depth of targs is deeper

Re: [PATCH v3] c++: Fix auto deduction for template specialization scopes [PR114915]

2024-05-15 Thread Patrick Palka
On Fri, 10 May 2024, Seyed Sajad Kahani wrote: > This patch resolves PR114915 by replacing the logic that fills in the missing > levels in do_auto_deduction in cp/pt.cc. > The new approach now trims targs if the depth of targs is deeper than desired > (this will only happen in specific

Re: [PATCH 0/4]AArch64: support conditional early clobbers on certain operations.

2024-05-15 Thread Richard Sandiford
Tamar Christina writes: >> -Original Message- >> From: Richard Biener >> Sent: Wednesday, May 15, 2024 12:20 PM >> To: Tamar Christina >> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw >> ; Marcus Shawcroft >> ; ktkac...@gcc.gnu.org; Richard Sandiford >> >> Subject: Re: [PATCH

Re: [PATCH v8] ada: fix timeval timespec on 32 bits archs with 64 bits time_t [PR114065]

2024-05-15 Thread Arnaud Charlet
Nicolas, Thank you for such a large and delicate change! This looks generally good, except for the first parts: we cannot change documented/user packages, meaning that GNAT.Calendar, System.OS_Lib (via the documented GNAT.OS_Lib) and Ada.Calendar.Conversion cannot be changed: we need to keep

Re: [RFC][PATCH] PR tree-optimization/109071 - -Warray-bounds false positive warnings due to code duplication from jump threading

2024-05-15 Thread David Malcolm
On Tue, 2024-05-14 at 15:08 +0200, Richard Biener wrote: > On Mon, 13 May 2024, Qing Zhao wrote: > > > -Warray-bounds is an important option to enable linux kernal to > > keep > > the array out-of-bound errors out of the source tree. > > > > However, due to the false positive warnings reported

[PATCH] tree-into-ssa: speed up sorting in prune_unused_phi_nodes [PR114480]

2024-05-15 Thread Alexander Monakov
In PR 114480 we are hitting a case where tree-into-ssa scales quadratically due to prune_unused_phi_nodes doing O(N log N) work for N basic blocks, for each variable individually. Sorting the 'defs' array is especially costly. It is possible to assist gcc_qsort by laying out dfs_out entries in

Re: [RFC][PATCH] PR tree-optimization/109071 - -Warray-bounds false positive warnings due to code duplication from jump threading

2024-05-15 Thread Qing Zhao
> On May 15, 2024, at 02:09, Richard Biener wrote: > > On Tue, 14 May 2024, Qing Zhao wrote: > >> >> >>> On May 14, 2024, at 13:14, Richard Biener wrote: >>> >>> On Tue, 14 May 2024, Qing Zhao wrote: >>> > On May 14, 2024, at 10:29, Richard Biener wrote: > >>> [...]

Re: [PATCH] RISC-V: Fix cbo.zero expansion for rv32

2024-05-15 Thread Christoph Müllner
On Wed, May 15, 2024 at 3:05 PM Jeff Law wrote: > > > > On 5/15/24 12:48 AM, Christoph Müllner wrote: > > Emitting a DI pattern won't find a match for rv32 and manifests in > > the failing test case gcc.target/riscv/cmo-zicboz-zic64-1.c. > > Let's fix this in the expansion and also address the

Re: [PATCH] RISC-V: Fix cbo.zero expansion for rv32

2024-05-15 Thread Jeff Law
On 5/15/24 12:48 AM, Christoph Müllner wrote: Emitting a DI pattern won't find a match for rv32 and manifests in the failing test case gcc.target/riscv/cmo-zicboz-zic64-1.c. Let's fix this in the expansion and also address the different code that gets generated for rv32/rv64. gcc/ChangeLog:

Re: [PATCH] RISC-V: Test cbo.zero expansion for rv32

2024-05-15 Thread Jeff Law
On 5/15/24 1:28 AM, Christoph Müllner wrote: We had an issue when expanding via cmo-zero for RV32. This was fixed upstream, but we don't have a RV32 test. Therefore, this patch introduces such a test. gcc/testsuite/ChangeLog: * gcc.target/riscv/cmo-zicboz-zic64-1.c: Fix for rv32.

Re: [PATCH] AArch64: Improve costing of ctz

2024-05-15 Thread Wilco Dijkstra
Hi Andrew, > I should note popcount has a similar issue which I hope to fix next week. > Popcount cost is used during expand so it is very useful to be slightly more > correct. It's useful to set the cost so that all of the special cases still apply - even if popcount is relatively fast, it's

Re: [PATCH] Adjust range type of calls into fold_range for IPA passes [PR114985]

2024-05-15 Thread Aldy Hernandez
Any thoughts on this? If no one objects, I'll re-enable prange tomorrow. Aldy On Sat, May 11, 2024 at 11:43 AM Aldy Hernandez wrote: > > I have pushed a few cleanups to make it easier to move forward without > disturbing passes which are affected by IPA's mixing up the range > types. As I

Re: [PATCH] middle-end/111422 - wrong stack var coalescing, handle PHIs

2024-05-15 Thread Jakub Jelinek
On Wed, May 15, 2024 at 01:41:04PM +0200, Richard Biener wrote: > PR middle-end/111422 > * cfgexpand.cc (add_scope_conflicts_2): Handle PHIs > by recursing to their arguments. > --- > gcc/cfgexpand.cc | 21 + > 1 file changed, 17 insertions(+), 4 deletions(-)

[PATCH] middle-end/111422 - wrong stack var coalescing, handle PHIs

2024-05-15 Thread Richard Biener
The gcc.c-torture/execute/pr111422.c testcase after installing the sink pass improvement reveals that we also need to handle _65 = + _58; _44 = + _43; # _59 = PHI <_65, _44> *_59 = 8; g = {v} {CLOBBER(eos)}; ... n[0] = *_59 = 8; g = {v} {CLOBBER(eos)}; where we fail to

RE: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int

2024-05-15 Thread Li, Pan2
> LGTM but you'll need an OK from Richard, > Thanks for working on this! Thanks Tamar for help and coaching, let's wait Richard for a while,! Pan -Original Message- From: Tamar Christina Sent: Wednesday, May 15, 2024 5:12 PM To: Li, Pan2 ; gcc-patches@gcc.gnu.org Cc:

Re: [PATCH] [x86] Set d.one_operand_p to true when TARGET_SSSE3 in ix86_expand_vecop_qihi_partial.

2024-05-15 Thread Uros Bizjak
On Wed, May 15, 2024 at 12:05 PM liuhongt wrote: > > pshufb is available under TARGET_SSSE3, so > ix86_expand_vec_perm_const_1 must return true when TARGET_SSSE3. > w/o TARGET_SSSE3, if we set one_operand_p to true, > ix86_expand_vec_perm_const_1 could return false. > > With the patch under

[COMMITTED] testsuite: Require lto-plugin in gcc.dg/ipa/ipa-icf-38.c [PR85656]

2024-05-15 Thread Rainer Orth
gcc.dg/ipa/ipa-icf-38.c currently FAILs on Solaris (SPARC and x86, 32 and 64-bit): FAIL: gcc.dg/ipa/ipa-icf-38.c scan-ltrans-tree-dump-not optimized "Function bar" As it turns out, this only happens when the Solaris linker is used; with GNU ld the test PASSes just fine. In fact, that happens

RE: [PATCH 0/4]AArch64: support conditional early clobbers on certain operations.

2024-05-15 Thread Tamar Christina
> -Original Message- > From: Richard Biener > Sent: Wednesday, May 15, 2024 12:20 PM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw > ; Marcus Shawcroft > ; ktkac...@gcc.gnu.org; Richard Sandiford > > Subject: Re: [PATCH 0/4]AArch64: support conditional early

Re: [PATCH 1/2] libstdc++: Fix data race in std::basic_ios::fill() [PR77704]

2024-05-15 Thread Jonathan Wakely
Pushed to trunk. On Tue, 7 May 2024 at 15:04, Jonathan Wakely wrote: > > Tested x86_64-linux. This seems "obviously correct", and I'd like to > push it. The current code definitely has a data race, i.e. undefined > behaviour. > > -- >8 -- > > The lazy caching in std::basic_ios::fill() updates a

Re: [PATCH 0/4]AArch64: support conditional early clobbers on certain operations.

2024-05-15 Thread Richard Biener
On Wed, May 15, 2024 at 12:29 PM Tamar Christina wrote: > > Hi All, > > Some Neoverse Software Optimization Guides (SWoG) have a clause that state > that for predicated operations that also produce a predicate it is preferred > that the codegen should use a different register for the destination

Re: [PATCH] [PATCH] Correct DLL Installation Path for x86_64-w64-mingw32 Multilib [PR115094]

2024-05-15 Thread Richard Biener
On Wed, May 15, 2024 at 11:39 AM unlvsur unlvsur wrote: > > cqwrteur@DESKTOP-9B705LH:~/gcc$ grep -r "# DLL is installed to" . > ./zlib/configure:# DLL is installed to $(libdir)/../bin by > postinstall_cmds > ./libitm/configure:# DLL is installed to $(libdir)/../bin by > postinstall_cmds

[COMMITTED] testsuite: i386: Fix g++.target/i386/pr97054.C on Solaris

2024-05-15 Thread Rainer Orth
g++.target/i386/pr97054.C currently FAILs on 64-bit Solaris/x86: FAIL: g++.target/i386/pr97054.C -std=gnu++14 (test for excess errors) UNRESOLVED: g++.target/i386/pr97054.C -std=gnu++14 compilation failed to produce executable FAIL: g++.target/i386/pr97054.C -std=gnu++17 (test for excess

Re: [PATCH 1/4]AArch64: convert several predicate patterns to new compact syntax

2024-05-15 Thread Richard Sandiford
Thanks for doing this a pre-patch. Minor request below: Tamar Christina writes: > ;; Perform a logical operation on operands 2 and 3, using operand 1 as > @@ -6676,38 +6690,42 @@ (define_insn "@aarch64_pred__z" > (define_insn "*3_cc" >[(set (reg:CC_NZC CC_REGNUM) > (unspec:CC_NZC >

RE: [PATCH 2/4]AArch64: add new tuning param and attribute for enabling conditional early clobber

2024-05-15 Thread Tamar Christina
> -Original Message- > From: Richard Sandiford > Sent: Wednesday, May 15, 2024 11:56 AM > To: Tamar Christina > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw > ; Marcus Shawcroft > ; ktkac...@gcc.gnu.org > Subject: Re: [PATCH 2/4]AArch64: add new tuning param and attribute for >

Re: [PATCH 2/4] RISC-V: Allow unaligned accesses in cpymemsi expansion

2024-05-15 Thread Christoph Müllner
On Sat, May 11, 2024 at 12:32 AM Jeff Law wrote: > > > > On 5/7/24 11:17 PM, Christoph Müllner wrote: > > The RISC-V cpymemsi expansion is called, whenever the by-pieces > > infrastructure will not take care of the builtin expansion. > > The code emitted by the by-pieces infrastructure may emits

Re: [PATCH v2] object lifetime instrumentation for Valgrind [PR66487]

2024-05-15 Thread Alexander Monakov
Hello, I'd like to ask if anyone has any new thoughts on this patch. Let me also point out that valgrind/memcheck.h is permissively licensed (BSD-style, rest of Valgrind is GPLv2), with the intention to allow importing into projects that are interested in using client requests without

Re: [PATCH 2/4]AArch64: add new tuning param and attribute for enabling conditional early clobber

2024-05-15 Thread Richard Sandiford
Tamar Christina writes: > Hi All, > > This adds a new tuning parameter EARLY_CLOBBER_SVE_PRED_DEST for AArch64 to > allow us to conditionally enable the early clobber alternatives based on the > tuning models. > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > Ok for master?

Re: [PATCH 1/4]AArch64: convert several predicate patterns to new compact syntax

2024-05-15 Thread Kyrill Tkachov
Hi Tamar, On Wed, 15 May 2024 at 11:28, Tamar Christina wrote: > Hi All, > > This converts the single alternative patterns to the new compact syntax > such > that when I add the new alternatives it's clearer what's being changed. > > Note that this will spew out a bunch of warnings from geninsn

[COMMITTED] [prange] Default pointers_handled_p() to false.

2024-05-15 Thread Aldy Hernandez
The pointers_handled_p() method is an internal range-op helper to help catch dispatch type mismatches for pointer operands. This is what caught the IPA mismatch in PR114985. This method is only a temporary measure to catch any incompatibilities in the current pointer range-op entries. This

[PATCH 3/4]AArch64: add new alternative with early clobber to patterns

2024-05-15 Thread Tamar Christina
Hi All, This patch adds new alternatives to the patterns which are affected. The new alternatives with the conditional early clobbers are added before the normal ones in order for LRA to prefer them in the event that we have enough free registers to accommodate them. In case register pressure

[PATCH 4/4]AArch64: enable new predicate tuning for Neoverse cores.

2024-05-15 Thread Tamar Christina
Hi All, This enables the new tuning flag for Neoverse V1, Neoverse V2 and Neoverse N2. It is kept off for generic codegen. Note the reason for the +sve even though they are in aarch64-sve.exp is if the testsuite is ran with a forced SVE off option, e.g. -march=armv8-a+nosve then the intrinsics

[PATCH 1/4]AArch64: convert several predicate patterns to new compact syntax

2024-05-15 Thread Tamar Christina
Hi All, This converts the single alternative patterns to the new compact syntax such that when I add the new alternatives it's clearer what's being changed. Note that this will spew out a bunch of warnings from geninsn as it'll warn that @ is useless for a single alternative pattern. These are

[PATCH 2/4]AArch64: add new tuning param and attribute for enabling conditional early clobber

2024-05-15 Thread Tamar Christina
Hi All, This adds a new tuning parameter EARLY_CLOBBER_SVE_PRED_DEST for AArch64 to allow us to conditionally enable the early clobber alternatives based on the tuning models. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog:

[PATCH 0/4]AArch64: support conditional early clobbers on certain operations.

2024-05-15 Thread Tamar Christina
Hi All, Some Neoverse Software Optimization Guides (SWoG) have a clause that state that for predicated operations that also produce a predicate it is preferred that the codegen should use a different register for the destination than that of the input predicate in order to avoid a performance

Re: [PATCH] AArch64: Improve costing of ctz

2024-05-15 Thread Andrew Pinski
On Wed, May 15, 2024, 12:17 PM Wilco Dijkstra wrote: > Improve costing of ctz - both TARGET_CSSC and vector cases were not > handled yet. > > Passes regress & bootstrap - OK for commit? > I should note popcount has a similar issue which I hope to fix next week. Popcount cost is used during

Re: [PATCH] AArch64: Use UZP1 instead of INS

2024-05-15 Thread Richard Sandiford
Wilco Dijkstra writes: > Use UZP1 instead of INS when combining low and high halves of vectors. > UZP1 has 3 operands which improves register allocation, and is faster on > some microarchitectures. > > Passes regress & bootstrap, OK for commit? OK, thanks. We can add core-specific tuning later

[PATCH] AArch64: Improve costing of ctz

2024-05-15 Thread Wilco Dijkstra
Improve costing of ctz - both TARGET_CSSC and vector cases were not handled yet. Passes regress & bootstrap - OK for commit? gcc: * config/aarch64/aarch64.cc (aarch64_rtx_costs): Improve CTZ costing. --- diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index

[PATCH] AArch64: Fix printing of 2-instruction alternatives

2024-05-15 Thread Wilco Dijkstra
Add missing '\' in 2-instruction movsi/di alternatives so that they are printed on separate lines. Passes bootstrap and regress, OK for commit once stage 1 reopens? gcc: * config/aarch64/aarch64.md (movsi_aarch64): Use '\;' to force newline in 2-instruction pattern.

[PATCH] AArch64: Use LDP/STP for large struct types

2024-05-15 Thread Wilco Dijkstra
Use LDP/STP for large struct types as they have useful immediate offsets and are typically faster. This removes differences between little and big endian and allows use of LDP/STP without UNSPEC. Passes regress and bootstrap, OK for commit? gcc: * config/aarch64/aarch64.cc

[PATCH] AArch64: Use LDP/STP for large struct types

2024-05-15 Thread Wilco Dijkstra
Use LDP/STP for large struct types as they have useful immediate offsets and are typically faster. This removes differences between little and big endian and allows use of LDP/STP without UNSPEC. Passes regress and bootstrap, OK for commit? gcc: * config/aarch64/aarch64.cc

[PATCH] AArch64: Use UZP1 instead of INS

2024-05-15 Thread Wilco Dijkstra
Use UZP1 instead of INS when combining low and high halves of vectors. UZP1 has 3 operands which improves register allocation, and is faster on some microarchitectures. Passes regress & bootstrap, OK for commit? gcc: * config/aarch64/aarch64-simd.md (aarch64_combine_internal):

[PATCH] [x86] Set d.one_operand_p to true when TARGET_SSSE3 in ix86_expand_vecop_qihi_partial.

2024-05-15 Thread liuhongt
pshufb is available under TARGET_SSSE3, so ix86_expand_vec_perm_const_1 must return true when TARGET_SSSE3. w/o TARGET_SSSE3, if we set one_operand_p to true, ix86_expand_vec_perm_const_1 could return false. With the patch under -march=x86-64-v2 v8qi foo (v8qi a) { return a >> 5; } <

Ping: [PATCH 0/2] Fix two test failures with --enable-default-pie [PR70150]

2024-05-15 Thread Xi Ruoyao
Ping. On Mon, 2024-05-06 at 12:45 +0800, Xi Ruoyao wrote: > In GCC 14.1-rc1, there are two new (comparing to GCC 13) failures if > the build is configured --enable-default-pie.  Let's fix them. > > Tested on x86_64-linux-gnu.  Ok for trunk and releases/gcc-14? > > Xi Ruoyao (2): >   i386:

[r15-499 Regression] FAIL: g++.target/i386/pr107563-b.C scan-assembler-times psrlw 1 on Linux/x86_64

2024-05-15 Thread haochen.jiang
On Linux/x86_64, a71f90c5a7ae2942083921033cb23dcd63e70525 is the first bad commit commit a71f90c5a7ae2942083921033cb23dcd63e70525 Author: Levy Hsu Date: Thu May 9 16:50:56 2024 +0800 x86: Add 3-instruction subroutine vector shift for V16QI in ix86_expand_vec_perm_const_1 [PR107563]

Re: [Patch, aarch64] v4: Preparatory patch to place target independent and,dependent changed code in one file

2024-05-15 Thread Ajit Agarwal
Hello Alex: On 14/05/24 11:53 pm, Alex Coplan wrote: > Hi Ajit, > > Please can you pay careful attention to the review comments? > > In particular, you have ignored my comment about changing the access of > member functions in ldp_bb_info several times now (on at least three > patch reviews). >

[PATCH] tree-optimization/114589 - remove profile based sink heuristics

2024-05-15 Thread Richard Biener
The following removes the profile based heuristic limiting sinking and instead uses post-dominators to avoid sinking to places that are executed under the same conditions as the earlier location which the profile based heuristic should have guaranteed as well. To avoid regressing this moves the

Re: [PATCH] Don't reduce estimated unrolled size for innermost loop.

2024-05-15 Thread Hongtao Liu
C -std=gnu++14 LP64 note (test for > > > > g++warnings, line 56) > > > > g++: g++.dg/warn/Warray-bounds-20.C -std=gnu++14 note (test for > > > > g++warnings, line 66) > > > > g++: g++.dg/warn/Warray-bounds-20.C -std=gnu++17 LP64 note (test for > > > > g++warnings, line 56) > > > > g++:

[Patch, aarch64] v6: Preparatory patch to place target independent and,dependent changed code in one file

2024-05-15 Thread Ajit Agarwal
Hello Alex/Richard: All review comments are addressed. Common infrastructure of load store pair fusion is divided into target independent and target dependent changed code. Target independent code is the Generic code with pure virtual function to interface between target independent and

Re: [PATCH] [PATCH] Correct DLL Installation Path for x86_64-w64-mingw32 Multilib [PR115094]

2024-05-15 Thread Richard Biener
On Wed, May 15, 2024 at 11:02 AM unlvsur unlvsur wrote: > > Hi. Richard. I checked configure.ac and it is not in configure.ac. It is in > the libtool.m4. The code was generated from libtool.m4 so it is correct. Ah, sorry - the libtool.m4 change escaped me ... It's been some time since we

Re: [PATCH] Don't reduce estimated unrolled size for innermost loop.

2024-05-15 Thread Richard Biener
On Wed, May 15, 2024 at 4:15 AM Hongtao Liu wrote: > > On Mon, May 13, 2024 at 3:40 PM Richard Biener > wrote: > > > > On Mon, May 13, 2024 at 4:29 AM liuhongt wrote: > > > > > > As testcase in the PR, O3 cunrolli may prevent vectorization for the > > > innermost loop and increase register

Re: [PATCH] libstdc++: Rewrite std::variant comparisons without macros

2024-05-15 Thread Jonathan Wakely
On Tue, 7 May 2024 at 14:51, Ville Voutilainen wrote: > > On Tue, 7 May 2024 at 16:47, Jonathan Wakely wrote: > > > > I don't think using a macro for these really saves us much, we can do > > this to avoid duplication instead. And now it's not a big, multi-line > > macro that's a pain to edit. >

[committed] libstdc++: Give std::memory_order a fixed underlying type [PR89624]

2024-05-15 Thread Jonathan Wakely
Tested x86_64-linux. Pushed to trunk. -- >8 -- Prior to C++20 this enum type doesn't have a fixed underlying type, which means it can be modified by -fshort-enums, which then means the HLE bits are outside the range of valid values for the type. As it has a fixed type of int in C++20 and later,

RE: [PATCH v5 1/3] Internal-fn: Support new IFN SAT_ADD for unsigned scalar int

2024-05-15 Thread Tamar Christina
Hi Pan, Thanks! > -Original Message- > From: pan2...@intel.com > Sent: Wednesday, May 15, 2024 3:14 AM > To: gcc-patches@gcc.gnu.org > Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; Tamar Christina > ; richard.guent...@gmail.com; > hongtao@intel.com; Pan Li > Subject: [PATCH v5

Re: [PATCH 1/8] [APX NF]: Support APX NF add

2024-05-15 Thread Uros Bizjak
On Wed, May 15, 2024 at 9:43 AM Kong, Lingling wrote: > > From: Hongyu Wang > > APX NF(no flags) feature implements suppresses the update of status flags for > arithmetic operations. > > For NF add, it is not clear whether NF add can be faster than lea. If so, the > pattern needs to be

RE: [PATCH 1/8] [APX NF]: Support APX NF add

2024-05-15 Thread Kong, Lingling
> -Original Message- > From: Uros Bizjak > Sent: Wednesday, May 15, 2024 4:15 PM > To: Kong, Lingling > Cc: gcc-patches@gcc.gnu.org; Liu, Hongtao ; Wang, > Hongyu > Subject: Re: [PATCH 1/8] [APX NF]: Support APX NF add > > On Wed, May 15, 2024 at 9:43 AM Kong, Lingling > wrote: > > >

Re: [PATCH 2/3] [APX CCMP] Adjust startegy for selecting ccmp candidates

2024-05-15 Thread Hongyu Wang
CC'd Richard for ccmp part as previously it is added only for aarch64. The original logic will not interrupted since if aarch64_gen_ccmp_first succeeded, aarch64_gen_ccmp_next will also success, the cmp/fcmp and ccmp/fccmp supports all GPI/GPF, and the prepare_operand will fixup the input that cmp

[PATCH 1/3] [APX CCMP] Support APX CCMP

2024-05-15 Thread Hongyu Wang
APX CCMP feature implements conditional compare which executes compare when EFLAGS matches certain condition. CCMP introduces default flags value (dfv), when conditional compare does not execute, it will directly set the flags according to dfv. The instruction goes like ccmpeq {dfv=sf,of,cf,zf}

[PATCH 2/3] [APX CCMP] Adjust startegy for selecting ccmp candidates

2024-05-15 Thread Hongyu Wang
For general ccmp scenario, the tree sequence is like _1 = (a < b) _2 = (c < d) _3 = _1 & _2 current ccmp expanding will try to swap compare order for _1 and _2, compare the cost/cost2 between compare _1 and _2 first, then return the sequence with lower cost. For x86 ccmp, we don't support FP

[PATCH 3/3] [APX CCMP] Support ccmp for float compare

2024-05-15 Thread Hongyu Wang
The ccmp insn itself doesn't support fp compare, but x86 has fp comi insn that changes EFLAG which can be the scc input to ccmp. Allow scalar fp compare in ix86_gen_ccmp_first except ORDERED/UNORDERD compare which can not be identified in ccmp. gcc/ChangeLog: * config/i386/i386-expand.cc

[PATCH 0/3] Support Intel APX CCMP

2024-05-15 Thread Hongyu Wang
APX CCMP feature[1] implements conditional compare which executes compare when EFLAGS matches certain condition. CCMP introduces default flags value (dfv), when conditional compare does not execute, it will directly set the flags according to dfv. From APX assembler

Re: [PATCH 1/8] [APX NF]: Support APX NF add

2024-05-15 Thread Uros Bizjak
On Wed, May 15, 2024 at 9:43 AM Kong, Lingling wrote: > > From: Hongyu Wang > > APX NF(no flags) feature implements suppresses the update of status flags for > arithmetic operations. > > For NF add, it is not clear whether NF add can be faster than lea. If so, the > pattern needs to be

  1   2   >