Re: [PATCH] [APX CCMP] Use ctestcc when comparing to const 0

2024-06-12 Thread Uros Bizjak
On Wed, Jun 12, 2024 at 12:00 PM Uros Bizjak  wrote:
>
> On Wed, Jun 12, 2024 at 5:12 AM Hongyu Wang  wrote:
> >
> > Hi,
> >
> > For CTEST, we don't have conditional AND so there's no optimization
> > opportunity to write a new ctest pattern. Emit ctest when ccmp did
> > comparison to const 0 to save bytes.
> >
> > Bootstrapped & regtested under x86-64-pc-linux-gnu.
> >
> > Ok for trunk?
> >
> > gcc/ChangeLog:
> >
> > * config/i386/i386.md (@ccmp): Use ctestcc when
> > operands[3] is const0_rtx.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/i386/apx-ccmp-1.c: Adjust output to scan ctest.
> > * gcc.target/i386/apx-ccmp-2.c: Adjust some condition to
> > compare with 0.
> > ---
> >  gcc/config/i386/i386.md|  6 +-
> >  gcc/testsuite/gcc.target/i386/apx-ccmp-1.c | 10 ++
> >  gcc/testsuite/gcc.target/i386/apx-ccmp-2.c |  4 ++--
> >  3 files changed, 13 insertions(+), 7 deletions(-)
> >
> > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> > index a64f2ad4f5f..014d48cddd6 100644
> > --- a/gcc/config/i386/i386.md
> > +++ b/gcc/config/i386/i386.md
> > @@ -1522,7 +1522,11 @@ (define_insn "@ccmp"
> >   [(match_operand:SI 4 "const_0_to_15_operand")]
> >   UNSPEC_APX_DFV)))]
> >   "TARGET_APX_CCMP"
> > - "ccmp%C1{}\t%G4 {%3, %2|%2, %3}"
> > + {
> > +   if (operands[3] == const0_rtx && !MEM_P (operands[2]))
> > + return "ctest%C1{}\t%G4 %2, %2";
> > +   return "ccmp%C1{}\t%G4 {%3, %2|%2, %3}";
> > + }
>
> This could be implemented as an alternative using "r,C" constraint as
> the first constraint for operands[2,3]. Then the register allocator
> will match the constraints for you.

Like in the attached (lightly tested) patch.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index a64f2ad4f5f..14d4d8cddca 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -1515,14 +1515,17 @@ (define_insn "@ccmp"
 (match_operator 1 "comparison_operator"
  [(reg:CC FLAGS_REG) (const_int 0)])
(compare:CC
- (minus:SWI (match_operand:SWI 2 "nonimmediate_operand" "m,")
-(match_operand:SWI 3 "" ","))
+ (minus:SWI (match_operand:SWI 2 "nonimmediate_operand" ",m,")
+(match_operand:SWI 3 "" 
"C,,"))
  (const_int 0))
(unspec:SI
  [(match_operand:SI 4 "const_0_to_15_operand")]
  UNSPEC_APX_DFV)))]
  "TARGET_APX_CCMP"
- "ccmp%C1{}\t%G4 {%3, %2|%2, %3}"
+ "@
+  ctest%C1{}\t%G4 %2, %2
+  ccmp%C1{}\t%G4 {%3, %2|%2, %3}
+  ccmp%C1{}\t%G4 {%3, %2|%2, %3}"
  [(set_attr "type" "icmp")
   (set_attr "mode" "")
   (set_attr "length_immediate" "1")


Re: [PATCH] [APX CCMP] Use ctestcc when comparing to const 0

2024-06-12 Thread Uros Bizjak
On Wed, Jun 12, 2024 at 5:12 AM Hongyu Wang  wrote:
>
> Hi,
>
> For CTEST, we don't have conditional AND so there's no optimization
> opportunity to write a new ctest pattern. Emit ctest when ccmp did
> comparison to const 0 to save bytes.
>
> Bootstrapped & regtested under x86-64-pc-linux-gnu.
>
> Ok for trunk?
>
> gcc/ChangeLog:
>
> * config/i386/i386.md (@ccmp): Use ctestcc when
> operands[3] is const0_rtx.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/apx-ccmp-1.c: Adjust output to scan ctest.
> * gcc.target/i386/apx-ccmp-2.c: Adjust some condition to
> compare with 0.
> ---
>  gcc/config/i386/i386.md|  6 +-
>  gcc/testsuite/gcc.target/i386/apx-ccmp-1.c | 10 ++
>  gcc/testsuite/gcc.target/i386/apx-ccmp-2.c |  4 ++--
>  3 files changed, 13 insertions(+), 7 deletions(-)
>
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index a64f2ad4f5f..014d48cddd6 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -1522,7 +1522,11 @@ (define_insn "@ccmp"
>   [(match_operand:SI 4 "const_0_to_15_operand")]
>   UNSPEC_APX_DFV)))]
>   "TARGET_APX_CCMP"
> - "ccmp%C1{}\t%G4 {%3, %2|%2, %3}"
> + {
> +   if (operands[3] == const0_rtx && !MEM_P (operands[2]))
> + return "ctest%C1{}\t%G4 %2, %2";
> +   return "ccmp%C1{}\t%G4 {%3, %2|%2, %3}";
> + }

This could be implemented as an alternative using "r,C" constraint as
the first constraint for operands[2,3]. Then the register allocator
will match the constraints for you.

Uros.

>   [(set_attr "type" "icmp")
>(set_attr "mode" "")
>(set_attr "length_immediate" "1")
> diff --git a/gcc/testsuite/gcc.target/i386/apx-ccmp-1.c 
> b/gcc/testsuite/gcc.target/i386/apx-ccmp-1.c
> index e4e112f07e0..a8b70576760 100644
> --- a/gcc/testsuite/gcc.target/i386/apx-ccmp-1.c
> +++ b/gcc/testsuite/gcc.target/i386/apx-ccmp-1.c
> @@ -96,9 +96,11 @@ f15 (double a, double b, int c, int d)
>
>  /* { dg-final { scan-assembler-times "ccmpg" 2 } } */
>  /* { dg-final { scan-assembler-times "ccmple" 2 } } */
> -/* { dg-final { scan-assembler-times "ccmpne" 4 } } */
> -/* { dg-final { scan-assembler-times "ccmpe" 3 } } */
> +/* { dg-final { scan-assembler-times "ccmpne" 2 } } */
> +/* { dg-final { scan-assembler-times "ccmpe" 1 } } */
>  /* { dg-final { scan-assembler-times "ccmpbe" 1 } } */
> +/* { dg-final { scan-assembler-times "ctestne" 2 } } */
> +/* { dg-final { scan-assembler-times "cteste" 2 } } */
>  /* { dg-final { scan-assembler-times "ccmpa" 1 } } */
> -/* { dg-final { scan-assembler-times "ccmpbl" 2 } } */
> -
> +/* { dg-final { scan-assembler-times "ccmpbl" 1 } } */
> +/* { dg-final { scan-assembler-times "ctestbl" 1 } } */
> diff --git a/gcc/testsuite/gcc.target/i386/apx-ccmp-2.c 
> b/gcc/testsuite/gcc.target/i386/apx-ccmp-2.c
> index 0123a686d2c..4a0784394c3 100644
> --- a/gcc/testsuite/gcc.target/i386/apx-ccmp-2.c
> +++ b/gcc/testsuite/gcc.target/i386/apx-ccmp-2.c
> @@ -12,7 +12,7 @@ int foo_apx(int a, int b, int c, int d)
>c += d;
>a += b;
>sum += a + c;
> -  if (b != d && sum < c || sum > d)
> +  if (b > d && sum != 0 || sum > d)
> {
>   b += d;
>   sum += b;
> @@ -32,7 +32,7 @@ int foo_noapx(int a, int b, int c, int d)
>c += d;
>a += b;
>sum += a + c;
> -  if (b != d && sum < c || sum > d)
> +  if (b > d && sum != 0 || sum > d)
> {
>   b += d;
>   sum += b;
> --
> 2.31.1
>


Re: [PATCH] rust: Do not link with libdl and libpthread unconditionally

2024-06-12 Thread Uros Bizjak
On Tue, Jun 11, 2024 at 11:21 AM Arthur Cohen  wrote:
>
> Thanks Richi!
>
> Tested again and pushed on trunk.


This patch introduced a couple of errors during ./configure:

checking for library containing dlopen... none required
checking for library containing pthread_create... none required
/git/gcc/configure: line 8997: test: too many arguments
/git/gcc/configure: line 8999: test: too many arguments
/git/gcc/configure: line 9003: test: too many arguments
/git/gcc/configure: line 9005: test: =: unary operator expected

You have to wrap arguments of the test with double quotes.

Uros.

> Best,
>
> Arthur
>
> On 5/31/24 15:02, Richard Biener wrote:
> > On Fri, May 31, 2024 at 12:24 PM Arthur Cohen  
> > wrote:
> >>
> >> Hi Richard,
> >>
> >> On 4/30/24 09:55, Richard Biener wrote:
> >>> On Fri, Apr 19, 2024 at 11:49 AM Arthur Cohen  
> >>> wrote:
> 
>  Hi everyone,
> 
>  This patch checks for the presence of dlopen and pthread_create in libc. 
>  If that is not the
>  case, we check for the existence of -ldl and -lpthread, as these 
>  libraries are required to
>  link the Rust runtime to our Rust frontend.
> 
>  If these libs are not present on the system, then we disable the Rust 
>  frontend.
> 
>  This was tested on x86_64, in an environment with a recent GLIBC and in 
>  a container with GLIBC
>  2.27.
> 
>  Apologies for sending it in so late.
> >>>
> >>> For example GCC_ENABLE_PLUGINS simply does
> >>>
> >>># Check -ldl
> >>>saved_LIBS="$LIBS"
> >>>AC_SEARCH_LIBS([dlopen], [dl])
> >>>if test x"$ac_cv_search_dlopen" = x"-ldl"; then
> >>>  pluginlibs="$pluginlibs -ldl"
> >>>fi
> >>>LIBS="$saved_LIBS"
> >>>
> >>> which I guess would also work for pthread_create?  This would simplify
> >>> the code a bit.
> >>
> >> Thanks a lot for the review. I've udpated the patch's content in
> >> configure.ac per your suggestion. Tested similarly on x86_64 and in a
> >> container with libc 2.27
> >
> > LGTM.
> >
> > Thanks,
> > Richard.
> >
> >>   From 00669b600a75743523c358ee41ab999b6e9fa0f6 Mon Sep 17 00:00:00 2001
> >> From: Arthur Cohen 
> >> Date: Fri, 12 Apr 2024 13:52:18 +0200
> >> Subject: [PATCH] rust: Do not link with libdl and libpthread 
> >> unconditionally
> >>
> >> ChangeLog:
> >>
> >>  * Makefile.tpl: Add CRAB1_LIBS variable.
> >>  * Makefile.in: Regenerate.
> >>  * configure: Regenerate.
> >>  * configure.ac: Check if -ldl and -lpthread are needed, and if 
> >> so, add
> >>  them to CRAB1_LIBS.
> >>
> >> gcc/rust/ChangeLog:
> >>
> >>  * Make-lang.in: Remove overazealous LIBS = -ldl -lpthread line, 
> >> link
> >>  crab1 against CRAB1_LIBS.
> >> ---
> >>Makefile.in   |   3 +
> >>Makefile.tpl  |   3 +
> >>configure | 154 ++
> >>configure.ac  |  41 +++
> >>gcc/rust/Make-lang.in |   6 +-
> >>5 files changed, 203 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/Makefile.in b/Makefile.in
> >> index edb0c8a9a42..1753fb6b862 100644
> >> --- a/Makefile.in
> >> +++ b/Makefile.in
> >> @@ -197,6 +197,7 @@ HOST_EXPORTS = \
> >>  $(BASE_EXPORTS) \
> >>  CC="$(CC)"; export CC; \
> >>  ADA_CFLAGS="$(ADA_CFLAGS)"; export ADA_CFLAGS; \
> >> +   CRAB1_LIBS="$(CRAB1_LIBS)"; export CRAB1_LIBS; \
> >>  CFLAGS="$(CFLAGS)"; export CFLAGS; \
> >>  CONFIG_SHELL="$(SHELL)"; export CONFIG_SHELL; \
> >>  CXX="$(CXX)"; export CXX; \
> >> @@ -450,6 +451,8 @@ GOCFLAGS = $(CFLAGS)
> >>GDCFLAGS = @GDCFLAGS@
> >>GM2FLAGS = $(CFLAGS)
> >>
> >> +CRAB1_LIBS = @CRAB1_LIBS@
> >> +
> >>PKG_CONFIG_PATH = @PKG_CONFIG_PATH@
> >>
> >>GUILE = guile
> >> diff --git a/Makefile.tpl b/Makefile.tpl
> >> index adbcbdd1d57..4aeaad3c1a5 100644
> >> --- a/Makefile.tpl
> >> +++ b/Makefile.tpl
> >> @@ -200,6 +200,7 @@ HOST_EXPORTS = \
> >>  $(BASE_EXPORTS) \
> >>  CC="$(CC)"; export CC; \
> >>  ADA_CFLAGS="$(ADA_CFLAGS)"; export ADA_CFLAGS; \
> >> +   CRAB1_LIBS="$(CRAB1_LIBS)"; export CRAB1_LIBS; \
> >>  CFLAGS="$(CFLAGS)"; export CFLAGS; \
> >>  CONFIG_SHELL="$(SHELL)"; export CONFIG_SHELL; \
> >>  CXX="$(CXX)"; export CXX; \
> >> @@ -453,6 +454,8 @@ GOCFLAGS = $(CFLAGS)
> >>GDCFLAGS = @GDCFLAGS@
> >>GM2FLAGS = $(CFLAGS)
> >>
> >> +CRAB1_LIBS = @CRAB1_LIBS@
> >> +
> >>PKG_CONFIG_PATH = @PKG_CONFIG_PATH@
> >>
> >>GUILE = guile
> >> diff --git a/configure b/configure
> >> index 02b435c1163..a9ea5258f0f 100755
> >> --- a/configure
> >> +++ b/configure
> >> @@ -690,6 +690,7 @@ extra_host_zlib_configure_flags
> >>extra_host_libiberty_configure_flags
> >>stage1_languages
> >>host_libs_picflag
> >> +CRAB1_LIBS
> >>PICFLAG
> >>host_shared
> >>gcc_host_pie
> >> @@ -8826,6 +8827,139 @@ fi
> >>
> >>
> >>
> >> +# Rust 

[committed] i386: Use CMOV in .SAT_{ADD|SUB} expansion for TARGET_CMOV [PR112600]

2024-06-11 Thread Uros Bizjak
For TARGET_CMOV targets emit insn sequence involving conditional move.

.SAT_ADD:

addl%esi, %edi
movl$-1, %eax
cmovnc  %edi, %eax
ret

.SAT_SUB:

subl%esi, %edi
movl$0, %eax
cmovnc  %edi, %eax
ret

PR target/112600

gcc/ChangeLog:

* config/i386/i386.md (usadd3): Emit insn sequence
involving conditional move for TARGET_CMOVE targets.
(ussub3): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr112600-a.c: Also scan for cmov.
* gcc.target/i386/pr112600-b.c: Ditto.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index d69bc8d6e48..a64f2ad4f5f 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -9885,13 +9885,35 @@ (define_expand "usadd3"
   ""
 {
   rtx res = gen_reg_rtx (mode);
-  rtx msk = gen_reg_rtx (mode);
   rtx dst;
 
   emit_insn (gen_add3_cc_overflow_1 (res, operands[1], operands[2]));
-  emit_insn (gen_x86_movcc_0_m1_neg (msk));
-  dst = expand_simple_binop (mode, IOR, res, msk,
-operands[0], 1, OPTAB_WIDEN);
+
+  if (TARGET_CMOVE)
+{
+  rtx cmp = gen_rtx_GEU (VOIDmode, gen_rtx_REG (CCCmode, FLAGS_REG),
+const0_rtx);
+
+  if ( < GET_MODE_SIZE (SImode))
+   {
+ dst = force_reg (mode, operands[0]);
+ emit_insn (gen_movsicc (gen_lowpart (SImode, dst), cmp,
+ gen_lowpart (SImode, res), constm1_rtx));
+   }
+   else
+   {
+ dst = operands[0];
+ emit_insn (gen_movcc (dst, cmp, res, constm1_rtx));
+   }
+}
+  else
+{
+  rtx msk = gen_reg_rtx (mode);
+
+  emit_insn (gen_x86_movcc_0_m1_neg (msk));
+  dst = expand_simple_binop (mode, IOR, res, msk,
+operands[0], 1, OPTAB_WIDEN);
+}
 
   if (!rtx_equal_p (dst, operands[0]))
 emit_move_insn (operands[0], dst);
@@ -9905,14 +9927,36 @@ (define_expand "ussub3"
   ""
 {
   rtx res = gen_reg_rtx (mode);
-  rtx msk = gen_reg_rtx (mode);
   rtx dst;
 
   emit_insn (gen_sub_3 (res, operands[1], operands[2]));
-  emit_insn (gen_x86_movcc_0_m1_neg (msk));
-  msk = expand_simple_unop (mode, NOT, msk, NULL, 1);
-  dst = expand_simple_binop (mode, AND, res, msk,
-operands[0], 1, OPTAB_WIDEN);
+
+  if (TARGET_CMOVE)
+{
+  rtx cmp = gen_rtx_GEU (VOIDmode, gen_rtx_REG (CCCmode, FLAGS_REG),
+const0_rtx);
+
+  if ( < GET_MODE_SIZE (SImode))
+   {
+ dst = force_reg (mode, operands[0]);
+ emit_insn (gen_movsicc (gen_lowpart (SImode, dst), cmp,
+ gen_lowpart (SImode, res), const0_rtx));
+   }
+   else
+   {
+ dst = operands[0];
+ emit_insn (gen_movcc (dst, cmp, res, const0_rtx));
+   }
+}
+  else
+{
+  rtx msk = gen_reg_rtx (mode);
+
+  emit_insn (gen_x86_movcc_0_m1_neg (msk));
+  msk = expand_simple_unop (mode, NOT, msk, NULL, 1);
+  dst = expand_simple_binop (mode, AND, res, msk,
+operands[0], 1, OPTAB_WIDEN);
+}
 
   if (!rtx_equal_p (dst, operands[0]))
 emit_move_insn (operands[0], dst);
diff --git a/gcc/testsuite/gcc.target/i386/pr112600-a.c 
b/gcc/testsuite/gcc.target/i386/pr112600-a.c
index fa122bc7a3f..2b084860451 100644
--- a/gcc/testsuite/gcc.target/i386/pr112600-a.c
+++ b/gcc/testsuite/gcc.target/i386/pr112600-a.c
@@ -1,7 +1,7 @@
 /* PR target/112600 */
 /* { dg-do compile } */
 /* { dg-options "-O2" } */
-/* { dg-final { scan-assembler-times "sbb" 4 } } */
+/* { dg-final { scan-assembler-times "sbb|cmov" 4 } } */
 
 unsigned char
 add_sat_char (unsigned char x, unsigned char y)
diff --git a/gcc/testsuite/gcc.target/i386/pr112600-b.c 
b/gcc/testsuite/gcc.target/i386/pr112600-b.c
index ea14bb9738b..ac4e26423b6 100644
--- a/gcc/testsuite/gcc.target/i386/pr112600-b.c
+++ b/gcc/testsuite/gcc.target/i386/pr112600-b.c
@@ -1,7 +1,7 @@
 /* PR target/112600 */
 /* { dg-do compile } */
 /* { dg-options "-O2" } */
-/* { dg-final { scan-assembler-times "sbb" 4 } } */
+/* { dg-final { scan-assembler-times "sbb|cmov" 4 } } */
 
 unsigned char
 sub_sat_char (unsigned char x, unsigned char y)


[committed] i386: Implement .SAT_SUB for unsigned scalar integers [PR112600]

2024-06-09 Thread Uros Bizjak
The following testcase:

unsigned
sub_sat (unsigned x, unsigned y)
{
  unsigned res;
  res = x - y;
  res &= -(x >= y);
  return res;
}

currently compiles (-O2) to:

sub_sat:
movl%edi, %edx
xorl%eax, %eax
subl%esi, %edx
cmpl%esi, %edi
setnb   %al
negl%eax
andl%edx, %eax
ret

We can expand through ussub{m}3 optab to use carry flag from the subtraction
and generate code using SBB instruction implementing:

unsigned res = x - y;
res &= ~(-(x < y));

sub_sat:
subl%esi, %edi
sbbl%eax, %eax
notl%eax
andl%edi, %eax
ret

PR target/112600

gcc/ChangeLog:

* config/i386/i386.md (ussub3): New expander.
(sub_3): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr112600-b.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index bc2ef819df6..d69bc8d6e48 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -8436,6 +8436,14 @@ (define_expand "usubv4"
   "ix86_fixup_binary_operands_no_copy (MINUS, mode, operands,
   TARGET_APX_NDD);")
 
+(define_expand "sub_3"
+  [(parallel [(set (reg:CC FLAGS_REG)
+  (compare:CC
+(match_operand:SWI 1 "nonimmediate_operand")
+(match_operand:SWI 2 "")))
+ (set (match_operand:SWI 0 "register_operand")
+  (minus:SWI (match_dup 1) (match_dup 2)))])])
+
 (define_insn "*sub_3"
   [(set (reg FLAGS_REG)
(compare (match_operand:SWI 1 "nonimmediate_operand" "0,0,rm,r")
@@ -9883,7 +9891,28 @@ (define_expand "usadd3"
   emit_insn (gen_add3_cc_overflow_1 (res, operands[1], operands[2]));
   emit_insn (gen_x86_movcc_0_m1_neg (msk));
   dst = expand_simple_binop (mode, IOR, res, msk,
-operands[0], 1, OPTAB_DIRECT);
+operands[0], 1, OPTAB_WIDEN);
+
+  if (!rtx_equal_p (dst, operands[0]))
+emit_move_insn (operands[0], dst);
+  DONE;
+})
+
+(define_expand "ussub3"
+  [(set (match_operand:SWI 0 "register_operand")
+   (us_minus:SWI (match_operand:SWI 1 "register_operand")
+ (match_operand:SWI 2 "")))]
+  ""
+{
+  rtx res = gen_reg_rtx (mode);
+  rtx msk = gen_reg_rtx (mode);
+  rtx dst;
+
+  emit_insn (gen_sub_3 (res, operands[1], operands[2]));
+  emit_insn (gen_x86_movcc_0_m1_neg (msk));
+  msk = expand_simple_unop (mode, NOT, msk, NULL, 1);
+  dst = expand_simple_binop (mode, AND, res, msk,
+operands[0], 1, OPTAB_WIDEN);
 
   if (!rtx_equal_p (dst, operands[0]))
 emit_move_insn (operands[0], dst);


Re: [committed] i386: Implement .SAT_ADD for unsigned scalar integers [PR112600]

2024-06-08 Thread Uros Bizjak
On Sat, Jun 8, 2024 at 2:09 PM Gerald Pfeifer  wrote:
>
> On Sat, 8 Jun 2024, Uros Bizjak wrote:
> > gcc/ChangeLog:
> >
> > * config/i386/i386.md (usadd3): New expander.
> > (x86_movcc_0_m1_neg): Use SWI mode iterator.
>
> When you write "committed", did you actually push?

Yes, IIRC, the request was to mark pushed change with the word "committed".

> If so, us being on Git now it might be good to adjust terminology.

No problem, I can say "pushed" if that is more descriptive.

Thanks,
Uros.


[committed] i386: Implement .SAT_ADD for unsigned scalar integers [PR112600]

2024-06-08 Thread Uros Bizjak
The following testcase:

unsigned
add_sat(unsigned x, unsigned y)
{
unsigned z;
return __builtin_add_overflow(x, y, ) ? -1u : z;
}

currently compiles (-O2) to:

add_sat:
addl%esi, %edi
jc  .L3
movl%edi, %eax
ret
.L3:
orl $-1, %eax
ret

We can expand through usadd{m}3 optab to use carry flag from the addition
and generate branchless code using SBB instruction implementing:

unsigned res = x + y;
res |= -(res < x);

add_sat:
addl%esi, %edi
sbbl%eax, %eax
orl %edi, %eax
ret

PR target/112600

gcc/ChangeLog:

* config/i386/i386.md (usadd3): New expander.
(x86_movcc_0_m1_neg): Use SWI mode iterator.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr112600-a.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index ffcf63e1cba..bc2ef819df6 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -9870,6 +9870,26 @@ (define_insn_and_split "*sub3_ne_0"
 operands[1] = force_reg (mode, operands[1]);
 })
 
+(define_expand "usadd3"
+  [(set (match_operand:SWI 0 "register_operand")
+   (us_plus:SWI (match_operand:SWI 1 "register_operand")
+(match_operand:SWI 2 "")))]
+  ""
+{
+  rtx res = gen_reg_rtx (mode);
+  rtx msk = gen_reg_rtx (mode);
+  rtx dst;
+
+  emit_insn (gen_add3_cc_overflow_1 (res, operands[1], operands[2]));
+  emit_insn (gen_x86_movcc_0_m1_neg (msk));
+  dst = expand_simple_binop (mode, IOR, res, msk,
+operands[0], 1, OPTAB_DIRECT);
+
+  if (!rtx_equal_p (dst, operands[0]))
+emit_move_insn (operands[0], dst);
+  DONE;
+})
+
 ;; The patterns that match these are at the end of this file.
 
 (define_expand "xf3"
@@ -24945,8 +24965,8 @@ (define_insn "*x86_movcc_0_m1_neg"
 
 (define_expand "x86_movcc_0_m1_neg"
   [(parallel
-[(set (match_operand:SWI48 0 "register_operand")
- (neg:SWI48 (ltu:SWI48 (reg:CCC FLAGS_REG) (const_int 0
+[(set (match_operand:SWI 0 "register_operand")
+ (neg:SWI (ltu:SWI (reg:CCC FLAGS_REG) (const_int 0
  (clobber (reg:CC FLAGS_REG))])])
 
 (define_split
diff --git a/gcc/testsuite/gcc.target/i386/pr112600-a.c 
b/gcc/testsuite/gcc.target/i386/pr112600-a.c
new file mode 100644
index 000..fa122bc7a3f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr112600-a.c
@@ -0,0 +1,32 @@
+/* PR target/112600 */
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler-times "sbb" 4 } } */
+
+unsigned char
+add_sat_char (unsigned char x, unsigned char y)
+{
+  unsigned char z;
+  return __builtin_add_overflow(x, y, ) ? -1u : z;
+}
+
+unsigned short
+add_sat_short (unsigned short x, unsigned short y)
+{
+  unsigned short z;
+  return __builtin_add_overflow(x, y, ) ? -1u : z;
+}
+
+unsigned int
+add_sat_int (unsigned int x, unsigned int y)
+{
+  unsigned int z;
+  return __builtin_add_overflow(x, y, ) ? -1u : z;
+}
+
+unsigned long
+add_sat_long (unsigned long x, unsigned long y)
+{
+  unsigned long z;
+  return __builtin_add_overflow(x, y, ) ? -1ul : z;
+}


Re: [PATCH v2 2/6] Extract ix86 dllimport implementation to mingw

2024-06-07 Thread Uros Bizjak
On Fri, Jun 7, 2024 at 11:48 AM Evgeny Karpov
 wrote:
>
> This patch extracts the ix86 implementation for expanding a SYMBOL
> into its corresponding dllimport, far-address, or refptr symbol.
> It will be reused in the aarch64-w64-mingw32 target.
> The implementation is copied as is from i386/i386.cc with
> minor changes to follow to the code style.
>
> Also this patch replaces the original DLL import/export
> implementation in ix86 with mingw.
>
> gcc/ChangeLog:
>
> * config.gcc: Add winnt-dll.o, which contains the DLL
> import/export implementation.
> * config/i386/cygming.h (SUB_TARGET_RECORD_STUB): Remove the
> old implementation. Rename the required function to MinGW.
> Use MinGW implementation for COFF and nothing otherwise.
> (GOT_ALIAS_SET): Likewise.
> * config/i386/i386-expand.cc (ix86_expand_move): Likewise.
> * config/i386/i386-expand.h (ix86_GOT_alias_set): Likewise.
> (legitimize_pe_coff_symbol): Likewise.
> * config/i386/i386-protos.h (i386_pe_record_stub): Likewise.
> * config/i386/i386.cc (is_imported_p): Likewise.
> (legitimate_pic_address_disp_p): Likewise.
> (ix86_GOT_alias_set): Likewise.
> (legitimize_pic_address): Likewise.
> (legitimize_tls_address): Likewise.
> (struct dllimport_hasher): Likewise.
> (GTY): Likewise.
> (get_dllimport_decl): Likewise.
> (legitimize_pe_coff_extern_decl): Likewise.
> (legitimize_dllimport_symbol): Likewise.
> (legitimize_pe_coff_symbol): Likewise.
> (ix86_legitimize_address): Likewise.
> * config/i386/i386.h (GOT_ALIAS_SET): Likewise.
> * config/mingw/winnt.cc (i386_pe_record_stub): Likewise.
> (mingw_pe_record_stub): Likewise.
> * config/mingw/winnt.h (mingw_pe_record_stub): Likewise.
> * config/mingw/t-cygming: Add the winnt-dll.o compilation.
> * config/mingw/winnt-dll.cc: New file.
> * config/mingw/winnt-dll.h: New file.

LGTM for generic x86 changes.

Thanks,
Uros.

> ---
>  gcc/config.gcc |  12 +-
>  gcc/config/i386/cygming.h  |   5 +-
>  gcc/config/i386/i386-expand.cc |   4 +-
>  gcc/config/i386/i386-expand.h  |   2 -
>  gcc/config/i386/i386-protos.h  |   1 -
>  gcc/config/i386/i386.cc| 205 ++---
>  gcc/config/i386/i386.h |   2 +
>  gcc/config/mingw/t-cygming |   6 +
>  gcc/config/mingw/winnt-dll.cc  | 231 +
>  gcc/config/mingw/winnt-dll.h   |  30 +
>  gcc/config/mingw/winnt.cc  |   2 +-
>  gcc/config/mingw/winnt.h   |   1 +
>  12 files changed, 298 insertions(+), 203 deletions(-)
>  create mode 100644 gcc/config/mingw/winnt-dll.cc
>  create mode 100644 gcc/config/mingw/winnt-dll.h
>
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index 553a310f4bd..d053b98efa8 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -2177,11 +2177,13 @@ i[4567]86-wrs-vxworks*|x86_64-wrs-vxworks7*)
>  i[34567]86-*-cygwin*)
> tm_file="${tm_file} i386/unix.h i386/bsd.h i386/gas.h i386/cygming.h 
> i386/cygwin.h i386/cygwin-stdint.h"
> tm_file="${tm_file} mingw/winnt.h"
> +   tm_file="${tm_file} mingw/winnt-dll.h"
> xm_file=i386/xm-cygwin.h
> tmake_file="${tmake_file} mingw/t-cygming t-slibgcc"
> target_gtfiles="$target_gtfiles \$(srcdir)/config/mingw/winnt.cc"
> +   target_gtfiles="$target_gtfiles \$(srcdir)/config/mingw/winnt-dll.cc"
> extra_options="${extra_options} mingw/cygming.opt i386/cygwin.opt"
> -   extra_objs="${extra_objs} winnt.o winnt-stubs.o"
> +   extra_objs="${extra_objs} winnt.o winnt-stubs.o winnt-dll.o"
> c_target_objs="${c_target_objs} msformat-c.o"
> cxx_target_objs="${cxx_target_objs} winnt-cxx.o msformat-c.o"
> d_target_objs="${d_target_objs} cygwin-d.o"
> @@ -2196,11 +2198,13 @@ x86_64-*-cygwin*)
> need_64bit_isa=yes
> tm_file="${tm_file} i386/unix.h i386/bsd.h i386/gas.h i386/cygming.h 
> i386/cygwin.h i386/cygwin-w64.h i386/cygwin-stdint.h"
> tm_file="${tm_file} mingw/winnt.h"
> +   tm_file="${tm_file} mingw/winnt-dll.h"
> xm_file=i386/xm-cygwin.h
> tmake_file="${tmake_file} mingw/t-cygming t-slibgcc"
> target_gtfiles="$target_gtfiles \$(srcdir)/config/mingw/winnt.cc"
> +   target_gtfiles="$target_gtfiles \$(srcdir)/config/mingw/winnt-dll.cc"
> extra_options="${extra_options} mingw/cygming.opt i386/cygwin.opt"
> -   extra_objs="${extra_objs} winnt.o winnt-stubs.o"
> +   extra_objs="${extra_objs} winnt.o winnt-stubs.o winnt-dll.o"
> c_target_objs="${c_target_objs} msformat-c.o"
> cxx_target_objs="${cxx_target_objs} winnt-cxx.o msformat-c.o"
> d_target_objs="${d_target_objs} cygwin-d.o"
> @@ -2266,6 +2270,7 @@ i[34567]86-*-mingw* | x86_64-*-mingw*)
> esac
> tm_file="${tm_file} mingw/mingw-stdint.h"
> 

Re: [x86 PATCH] PR target/115351: RTX costs for *concatditi3 and *insvti_highpart.

2024-06-07 Thread Uros Bizjak
On Fri, Jun 7, 2024 at 11:21 AM Roger Sayle  wrote:
>
>
> This patch addresses PR target/115351, which is a code quality regression
> on x86 when passing floating point complex numbers.  The ABI considers
> these arguments to have TImode, requiring interunit moves to place the
> FP values (which are actually passed in SSE registers) into the upper
> and lower parts of a TImode pseudo, and then similar moves back again
> before they can be used.
>
> The cause of the regression is that changes in how TImode initialization
> is represented in RTL now prevents the RTL optimizers from eliminating
> these redundant moves.  The specific cause is that the *concatditi3
> pattern, (zext(hi)<<64)|zext(lo), has an inappropriately high (default)
> rtx_cost, preventing fwprop1 from propagating it.  This pattern just
> sets the hipart and lopart of a double-word register, typically two
> instructions (less if reload can allocate things appropriately) but
> the current ix86_rtx_costs actually returns INSN_COSTS(13), i.e. 52.
>
> propagating insn 5 into insn 6, replacing:
> (set (reg:TI 110)
> (ior:TI (and:TI (reg:TI 110)
> (const_wide_int 0x0))
> (ashift:TI (zero_extend:TI (subreg:DI (reg:DF 112 [ zD.2796+8 ]) 0))
> (const_int 64 [0x40]
> successfully matched this instruction to *concatditi3_3:
> (set (reg:TI 110)
> (ior:TI (ashift:TI (zero_extend:TI (subreg:DI (reg:DF 112 [ zD.2796+8 ])
> 0))
> (const_int 64 [0x40]))
> (zero_extend:TI (subreg:DI (reg:DF 111 [ zD.2796 ]) 0
> change not profitable (cost 50 -> cost 52)
>
> This issue is resolved by having ix86_rtx_costs return more reasonable
> values for these (place-holder) patterns.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
>
> 2024-06-07  Roger Sayle  
>
> gcc/ChangeLog
> PR target/115351
> * config/i386/i386.cc (ix86_rtx_costs): Provide estimates for the
> *concatditi3 and *insvti_highpart patterns, about two insns.
>
> gcc/testsuite/ChangeLog
> PR target/115351
> * g++.target/i386/pr115351.C: New test case.

LGTM.

Thanks,
Uros.

>
>
> Thanks in advance (and sorry for any inconvenience),
> Roger
> --
>


[committed] testsuite/i386: Add vector sat_sub testcases [PR112600]

2024-06-06 Thread Uros Bizjak
PR middle-end/112600

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr112600-2a.c: New test.
* gcc.target/i386/pr112600-2b.c: New test.

Tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/testsuite/gcc.target/i386/pr112600-2a.c 
b/gcc/testsuite/gcc.target/i386/pr112600-2a.c
new file mode 100644
index 000..4df38e5a720
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr112600-2a.c
@@ -0,0 +1,15 @@
+/* PR middle-end/112600 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize -msse2" } */
+
+typedef unsigned char T;
+
+void foo (T *out, T *x, T *y, int n)
+{
+  int i;
+
+  for (i = 0; i < n; i++)
+out[i] = (x[i] - y[i]) & (-(T)(x[i] >= y[i]));
+}
+
+/* { dg-final { scan-assembler "psubusb" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr112600-2b.c 
b/gcc/testsuite/gcc.target/i386/pr112600-2b.c
new file mode 100644
index 000..0f6345de704
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr112600-2b.c
@@ -0,0 +1,15 @@
+/* PR middle-end/112600 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize -msse2" } */
+
+typedef unsigned short T;
+
+void foo (T *out, T *x, T *y, int n)
+{
+  int i;
+
+  for (i = 0; i < n; i++)
+out[i] = (x[i] - y[i]) & (-(T)(x[i] >= y[i]));
+}
+
+/* { dg-final { scan-assembler "psubusw" } } */


Re: [PATCH v1] Internal-fn: Support new IFN SAT_SUB for unsigned scalar int

2024-06-05 Thread Uros Bizjak
On Wed, Jun 5, 2024 at 10:52 AM Li, Pan2  wrote:
>
> Thanks for explaining. I see, cmove is well designed for such cases.

If the question is if it is worth it to convert using
__builtin_sub_overflow here if the target doesn't provide scalar
saturating optab, I think the answer is yes. For x86, the compare will
be eliminated.

Please consider this testcase:

--cut here--
unsigned int
__attribute__((noinline))
foo (unsigned int x, unsigned int y)
{
  return x > y ? x - y : 0;
}

unsigned int
__attribute__((noinline))
bar (unsigned int x, unsigned int y)
{
  unsigned int z;

  return __builtin_sub_overflow (x, y, ) ? 0 : z;
}
--cut here--

This will compile to:

 :
  0:   89 f8   mov%edi,%eax
  2:   31 d2   xor%edx,%edx
  4:   29 f0   sub%esi,%eax
  6:   39 fe   cmp%edi,%esi
  8:   0f 43 c2cmovae %edx,%eax
  b:   c3  ret
  c:   0f 1f 40 00 nopl   0x0(%rax)

0010 :
 10:   29 f7   sub%esi,%edi
 12:   72 03   jb 17 
 14:   89 f8   mov%edi,%eax
 16:   c3  ret
 17:   31 c0   xor%eax,%eax
 19:   c3  ret

Please note that the compare was eliminated in the later test. So, if
the target does not provide saturated optab but provides
__builtin_sub_overflow, I think it is worth emitting .SAT_SUB via
__builtin_sub_overflow (and in similar way for saturated add).

Uros.


>
> Pan
>
> -Original Message-
> From: Uros Bizjak 
> Sent: Wednesday, June 5, 2024 4:46 PM
> To: Li, Pan2 
> Cc: Richard Biener ; gcc-patches@gcc.gnu.org; 
> juzhe.zh...@rivai.ai; kito.ch...@gmail.com; tamar.christ...@arm.com
> Subject: Re: [PATCH v1] Internal-fn: Support new IFN SAT_SUB for unsigned 
> scalar int
>
> On Wed, Jun 5, 2024 at 10:38 AM Li, Pan2  wrote:
> >
> > > I see. x86 doesn't have scalar saturating instructions, so the scalar
> > > version indeed can't be converted.
> >
> > > I will amend x86 testcases after the vector part of your patch is 
> > > committed.
> >
> > Thanks for the confirmation. Just curious, the .SAT_SUB for scalar has 
> > sorts of forms, like a branch version as below.
> >
> > .SAT_SUB (x, y) = x > y ? x - y : 0. // or leverage __builtin_sub_overflow 
> > here
> >
> > It is reasonable to implement the scalar .SAT_SUB for x86? Given somehow we 
> > can eliminate the branch here.
>
> x86 will emit cmove in the above case:
>
>movl%edi, %eax
>xorl%edx, %edx
>subl%esi, %eax
>cmpl%edi, %esi
>cmovnb  %edx, %eax
>
> Maybe we can reuse flags from the subtraction here to avoid the compare.
>
> Uros.


Re: [PATCH v1] Internal-fn: Support new IFN SAT_SUB for unsigned scalar int

2024-06-05 Thread Uros Bizjak
On Wed, Jun 5, 2024 at 10:38 AM Li, Pan2  wrote:
>
> > I see. x86 doesn't have scalar saturating instructions, so the scalar
> > version indeed can't be converted.
>
> > I will amend x86 testcases after the vector part of your patch is committed.
>
> Thanks for the confirmation. Just curious, the .SAT_SUB for scalar has sorts 
> of forms, like a branch version as below.
>
> .SAT_SUB (x, y) = x > y ? x - y : 0. // or leverage __builtin_sub_overflow 
> here
>
> It is reasonable to implement the scalar .SAT_SUB for x86? Given somehow we 
> can eliminate the branch here.

x86 will emit cmove in the above case:

   movl%edi, %eax
   xorl%edx, %edx
   subl%esi, %eax
   cmpl%edi, %esi
   cmovnb  %edx, %eax

Maybe we can reuse flags from the subtraction here to avoid the compare.

Uros.


Re: [PATCH v1] Internal-fn: Support new IFN SAT_SUB for unsigned scalar int

2024-06-05 Thread Uros Bizjak
On Wed, Jun 5, 2024 at 10:22 AM Li, Pan2  wrote:
>
> > Is the above testcase correct? You need "(x + y)" as the first term.
>
> Thanks for comments, should be copy issue here, you can take SAT_SUB (x, y) 
> => (x - y) & (-(TYPE)(x >= y)) or below template for reference.
>
> +#define DEF_SAT_U_SUB_FMT_1(T) \
> +T __attribute__((noinline))\
> +sat_u_sub_##T##_fmt_1 (T x, T y)   \
> +{  \
> +  return (x - y) & (-(T)(x >= y)); \
> +}
> +
> +#define DEF_SAT_U_SUB_FMT_2(T)\
> +T __attribute__((noinline))   \
> +sat_u_sub_##T##_fmt_2 (T x, T y)  \
> +{ \
> +  return (x - y) & (-(T)(x > y)); \
> +}
>
> > BTW: After applying your patch, I'm not able to produce .SAT_SUB with
> > x86_64 and the following testcase:
>
> You mean vectorize part? This patch is only for unsigned scalar int (see 
> title) and the below is the vect part.
> Could you please help to double confirm if you cannot see .SAT_SUB after 
> widen_mul pass in x86 for unsigned scalar int?
> Of course, I will have a try later as in the middle of sth.
>
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653024.html

I see. x86 doesn't have scalar saturating instructions, so the scalar
version indeed can't be converted.

I will amend x86 testcases after the vector part of your patch is committed.

Thanks,
Uros.


Re: [PATCH v1] Internal-fn: Support new IFN SAT_SUB for unsigned scalar int

2024-06-05 Thread Uros Bizjak
On Wed, Jun 5, 2024 at 9:38 AM Li, Pan2  wrote:
>
> Thanks Richard, will commit after the rebased pass the regression test.
>
> Pan
>
> -Original Message-
> From: Richard Biener 
> Sent: Wednesday, June 5, 2024 3:19 PM
> To: Li, Pan2 
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
> tamar.christ...@arm.com
> Subject: Re: [PATCH v1] Internal-fn: Support new IFN SAT_SUB for unsigned 
> scalar int
>
> On Tue, May 28, 2024 at 10:29 AM  wrote:
> >
> > From: Pan Li 
> >
> > This patch would like to add the middle-end presentation for the
> > saturation sub.  Aka set the result of add to the min when downflow.
> > It will take the pattern similar as below.
> >
> > SAT_SUB (x, y) => (x - y) & (-(TYPE)(x >= y));
> >
> > For example for uint8_t, we have
> >
> > * SAT_SUB (255, 0)   => 255
> > * SAT_SUB (1, 2) => 0
> > * SAT_SUB (254, 255) => 0
> > * SAT_SUB (0, 255)   => 0
> >
> > Given below SAT_SUB for uint64
> >
> > uint64_t sat_sub_u64 (uint64_t x, uint64_t y)
> > {
> >   return (x + y) & (- (uint64_t)((x >= y)));
> > }

Is the above testcase correct? You need "(x + y)" as the first term.

BTW: After applying your patch, I'm not able to produce .SAT_SUB with
x86_64 and the following testcase:

--cut here--
typedef unsigned short T;

void foo (T *out, T *x, T *y, int n)
{
  int i;

  for (i = 0; i < n; i++)
out[i] = (x[i] - y[i]) & (-(T)(x[i] >= y[i]));
}
--cut here--

with gcc -O2 -ftree-vectorize -msse2

I think that all relevant optabs were added for x86 in

https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=b59de4113262f2bee14147eb17eb3592f03d9556

as part of the commit for PR112600, comment 8.

Uros.

> >
> > Before this patch:
> > uint64_t sat_sub_u_0_uint64_t (uint64_t x, uint64_t y)
> > {
> >   _Bool _1;
> >   long unsigned int _3;
> >   uint64_t _6;
> >
> > ;;   basic block 2, loop depth 0
> > ;;pred:   ENTRY
> >   _1 = x_4(D) >= y_5(D);
> >   _3 = x_4(D) - y_5(D);
> >   _6 = _1 ? _3 : 0;
> >   return _6;
> > ;;succ:   EXIT
> > }
> >
> > After this patch:
> > uint64_t sat_sub_u_0_uint64_t (uint64_t x, uint64_t y)
> > {
> >   uint64_t _6;
> >
> > ;;   basic block 2, loop depth 0
> > ;;pred:   ENTRY
> >   _6 = .SAT_SUB (x_4(D), y_5(D)); [tail call]
> >   return _6;
> > ;;succ:   EXIT
> > }
> >
> > The below tests are running for this patch:
> > *. The riscv fully regression tests.
> > *. The x86 bootstrap tests.
> > *. The x86 fully regression tests.
>
> OK.
>
> Thanks,
> Richard.
>
> > PR target/51492
> > PR target/112600
> >
> > gcc/ChangeLog:
> >
> > * internal-fn.def (SAT_SUB): Add new IFN define for SAT_SUB.
> > * match.pd: Add new match for SAT_SUB.
> > * optabs.def (OPTAB_NL): Remove fixed-point for ussub/ssub.
> > * tree-ssa-math-opts.cc (gimple_unsigned_integer_sat_sub): Add
> > new decl for generated in match.pd.
> > (build_saturation_binary_arith_call): Add new helper function
> > to build the gimple call to binary SAT alu.
> > (match_saturation_arith): Rename from.
> > (match_unsigned_saturation_add): Rename to.
> > (match_unsigned_saturation_sub): Add new func to match the
> > unsigned sat sub.
> > (math_opts_dom_walker::after_dom_children): Add SAT_SUB matching
> > try when COND_EXPR.
> >
> > Signed-off-by: Pan Li 
> > ---
> >  gcc/internal-fn.def   |  1 +
> >  gcc/match.pd  | 14 
> >  gcc/optabs.def|  4 +--
> >  gcc/tree-ssa-math-opts.cc | 67 +++
> >  4 files changed, 64 insertions(+), 22 deletions(-)
> >
> > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> > index 25badbb86e5..24539716e5b 100644
> > --- a/gcc/internal-fn.def
> > +++ b/gcc/internal-fn.def
> > @@ -276,6 +276,7 @@ DEF_INTERNAL_SIGNED_OPTAB_FN (MULHRS, ECF_CONST | 
> > ECF_NOTHROW, first,
> >   smulhrs, umulhrs, binary)
> >
> >  DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_ADD, ECF_CONST, first, ssadd, usadd, 
> > binary)
> > +DEF_INTERNAL_SIGNED_OPTAB_FN (SAT_SUB, ECF_CONST, first, sssub, ussub, 
> > binary)
> >
> >  DEF_INTERNAL_COND_FN (ADD, ECF_CONST, add, binary)
> >  DEF_INTERNAL_COND_FN (SUB, ECF_CONST, sub, binary)
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index 024e3350465..3e334533ff8 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -3086,6 +3086,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >  (match (unsigned_integer_sat_add @0 @1)
> >   (bit_ior:c (usadd_left_part_2 @0 @1) (usadd_right_part_2 @0 @1)))
> >
> > +/* Unsigned saturation sub, case 1 (branch with gt):
> > +   SAT_U_SUB = X > Y ? X - Y : 0  */
> > +(match (unsigned_integer_sat_sub @0 @1)
> > + (cond (gt @0 @1) (minus @0 @1) integer_zerop)
> > + (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> > +  && types_match (type, @0, @1
> > +
> > +/* Unsigned saturation sub, case 2 (branch with ge):
> > +   SAT_U_SUB = X >= Y ? X - Y : 0.  */
> > 

Re: [PATCH v1 0/6] Add DLL import/export implementation to AArch64

2024-06-05 Thread Uros Bizjak
On Tue, Jun 4, 2024 at 10:10 PM Evgeny Karpov
 wrote:
>
> Richard and Uros, could you please review the changes for v2?

LGTM for the generic x86 part, OS-specific part (cygming) should also
be reviewed by OS port maintainer (CC'd).

Thanks,
Uros.

> Additionally, we have detected an issue with GCC GC in winnt-dll.cc. The fix 
> will be included in v2.
>
> >> -ix86_handle_selectany_attribute (tree *node, tree name, tree, int,
> >> +mingw_handle_selectany_attribute (tree *node, tree name, tree, int,
> >>   bool *no_add_attrs)
>
> > please reindent the parameters for the new name length.
>
> Richard, could you please clarify how it should be done?
> Thanks!
>
> Regards,
> Evgeny
>
>
> ---
>  gcc/config/aarch64/cygming.h   |  6 +
>  gcc/config/i386/cygming.h  |  6 +
>  gcc/config/i386/i386-expand.cc |  6 +++--
>  gcc/config/i386/i386-expand.h  |  2 --
>  gcc/config/i386/i386.cc| 42 ++
>  gcc/config/i386/i386.h |  2 ++
>  gcc/config/mingw/winnt-dll.cc  |  8 ++-
>  gcc/config/mingw/winnt-dll.h   |  2 +-
>  8 files changed, 33 insertions(+), 41 deletions(-)
>
> diff --git a/gcc/config/aarch64/cygming.h b/gcc/config/aarch64/cygming.h
> index 4beebf9e093..0ff475754e0 100644
> --- a/gcc/config/aarch64/cygming.h
> +++ b/gcc/config/aarch64/cygming.h
> @@ -183,4 +183,10 @@ still needed for compilation.  */
>  #undef MAX_OFILE_ALIGNMENT
>  #define MAX_OFILE_ALIGNMENT (8192 * 8)
>
> +#define CMODEL_IS_NOT_LARGE_OR_MEDIUM_PIC 0
> +
> +#define HAVE_64BIT_POINTERS 1
> +
> +#define GOT_ALIAS_SET mingw_GOT_alias_set ()
> +
>  #endif
> diff --git a/gcc/config/i386/cygming.h b/gcc/config/i386/cygming.h
> index ee01e6bb6ce..cd240533dbc 100644
> --- a/gcc/config/i386/cygming.h
> +++ b/gcc/config/i386/cygming.h
> @@ -469,3 +469,9 @@ do {\
>  #ifndef HAVE_GAS_ALIGNED_COMM
>  # define HAVE_GAS_ALIGNED_COMM 0
>  #endif
> +
> +#define CMODEL_IS_NOT_LARGE_OR_MEDIUM_PIC ix86_cmodel != CM_LARGE_PIC && 
> ix86_cmodel != CM_MEDIUM_PIC
> +
> +#define HAVE_64BIT_POINTERS TARGET_64BIT_DEFAULT
> +
> +#define GOT_ALIAS_SET mingw_GOT_alias_set ()
> diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
> index fb460e30d0a..267d0ba257b 100644
> --- a/gcc/config/i386/i386-expand.cc
> +++ b/gcc/config/i386/i386-expand.cc
> @@ -408,11 +408,12 @@ ix86_expand_move (machine_mode mode, rtx operands[])
>  : UNSPEC_GOT));
>   op1 = gen_rtx_CONST (Pmode, op1);
>   op1 = gen_const_mem (Pmode, op1);
> - set_mem_alias_set (op1, ix86_GOT_alias_set ());
> + set_mem_alias_set (op1, GOT_ALIAS_SET);
> }
>else
> {
> - tmp = ix86_legitimize_pe_coff_symbol (op1, addend != NULL_RTX);
> +#if TARGET_PECOFF
> + tmp = legitimize_pe_coff_symbol (op1, addend != NULL_RTX);
>   if (tmp)
> {
>   op1 = tmp;
> @@ -424,6 +425,7 @@ ix86_expand_move (machine_mode mode, rtx operands[])
>   op1 = operands[1];
>   break;
> }
> +#endif
> }
>
>if (addend)
> diff --git a/gcc/config/i386/i386-expand.h b/gcc/config/i386/i386-expand.h
> index a8c20993954..5e02df1706d 100644
> --- a/gcc/config/i386/i386-expand.h
> +++ b/gcc/config/i386/i386-expand.h
> @@ -34,9 +34,7 @@ struct expand_vec_perm_d
>  };
>
>  rtx legitimize_tls_address (rtx x, enum tls_model model, bool for_mov);
> -alias_set_type ix86_GOT_alias_set (void);
>  rtx legitimize_pic_address (rtx orig, rtx reg);
> -rtx ix86_legitimize_pe_coff_symbol (rtx addr, bool inreg);
>
>  bool insn_defines_reg (unsigned int regno1, unsigned int regno2,
>rtx_insn *insn);
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index 66845b30446..ee3a59ed498 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -11807,30 +11807,6 @@ constant_address_p (rtx x)
>  }
>
>
>
> -#if TARGET_PECOFF
> -rtx ix86_legitimize_pe_coff_symbol (rtx addr, bool inreg)
> -{
> -  return legitimize_pe_coff_symbol (addr, inreg);
> -}
> -
> -alias_set_type
> -ix86_GOT_alias_set (void)
> -{
> -  return mingw_GOT_alias_set ();
> -}
> -#else
> -rtx ix86_legitimize_pe_coff_symbol (rtx addr, bool inreg)
> -{
> -  return NULL_RTX;
> -}
> -
> -alias_set_type
> -ix86_GOT_alias_set (void)
> -{
> -  return -1;
> -}
> -#endif
> -
>  /* Return a legitimate reference for ORIG (an address) using the
> register REG.  If REG is 0, a new pseudo is generated.
>
> @@ -11867,9 +11843,11 @@ legitimize_pic_address (rtx orig, rtx reg)
>
>if (TARGET_64BIT && TARGET_DLLIMPORT_DECL_ATTRIBUTES)
>  {
> -  rtx tmp = ix86_legitimize_pe_coff_symbol (addr, true);
> +#if TARGET_PECOFF
> +  rtx tmp = legitimize_pe_coff_symbol (addr, true);
>if (tmp)
>  return tmp;
> +#endif
>  }
>
>if (TARGET_64BIT && legitimate_pic_address_disp_p (addr))
> @@ -11912,9 

[committed] i386: Force operand 1 of bswapsi2 to a register for !TARGET_BSWAP [PR115321]

2024-06-03 Thread Uros Bizjak
PR target/115321

gcc/ChangeLog:

* config/i386/i386.md (bswapsi2): Force operand 1
to a register also for !TARGET_BSWAP.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr115321.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,m32}.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 2c95395b7be..ef83984d00e 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -21193,18 +21193,19 @@ (define_expand "bswapsi2"
(bswap:SI (match_operand:SI 1 "nonimmediate_operand")))]
   ""
 {
-  if (TARGET_MOVBE)
-;
-  else if (TARGET_BSWAP)
-operands[1] = force_reg (SImode, operands[1]);
-  else
+  if (!TARGET_MOVBE)
 {
-  rtx x = gen_reg_rtx (SImode);
+  operands[1] = force_reg (SImode, operands[1]);
 
-  emit_insn (gen_bswaphisi2_lowpart (x, operands[1]));
-  emit_insn (gen_rotlsi3 (x, x, GEN_INT (16)));
-  emit_insn (gen_bswaphisi2_lowpart (operands[0], x));
-  DONE;
+  if (!TARGET_BSWAP)
+   {
+ rtx x = gen_reg_rtx (SImode);
+
+ emit_insn (gen_bswaphisi2_lowpart (x, operands[1]));
+ emit_insn (gen_rotlsi3 (x, x, GEN_INT (16)));
+ emit_insn (gen_bswaphisi2_lowpart (operands[0], x));
+ DONE;
+   }
 }
 })
 
diff --git a/gcc/testsuite/gcc.target/i386/pr115321.c 
b/gcc/testsuite/gcc.target/i386/pr115321.c
new file mode 100644
index 000..0ddab9bd7a5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr115321.c
@@ -0,0 +1,4 @@
+/* { dg-do compile { target ia32 } } */
+/* { dg-options "-march=i386" } */
+
+unsigned foo (unsigned x) { return __builtin_bswap32 (x); }


Re: [PATCH] [x86] Add some preference for floating point rtl ifcvt when sse4.1 is not available

2024-06-03 Thread Uros Bizjak
On Mon, Jun 3, 2024 at 5:11 AM liuhongt  wrote:
>
> W/o TARGET_SSE4_1, it takes 3 instructions (pand, pandn and por) for
> movdfcc/movsfcc, and could possibly fail cost comparison. Increase
> branch cost could hurt performance for other modes, so specially add
> some preference for floating point ifcvt.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?
>
> gcc/ChangeLog:
>
> * config/i386/i386.cc (ix86_noce_conversion_profitable_p): Add
> some preference for floating point ifcvt when SSE4.1 is not
> available.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr115299.c: New test.
> * gcc.target/i386/pr86722.c: Adjust testcase.

LGTM.

Thanks,
Uros.

> ---
>  gcc/config/i386/i386.cc  | 17 +
>  gcc/testsuite/gcc.target/i386/pr115299.c | 10 ++
>  gcc/testsuite/gcc.target/i386/pr86722.c  |  2 +-
>  3 files changed, 28 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr115299.c
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index 1a0206ab573..271da127a89 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -24879,6 +24879,23 @@ ix86_noce_conversion_profitable_p (rtx_insn *seq, 
> struct noce_if_info *if_info)
> return false;
> }
>  }
> +
> +  /* W/o TARGET_SSE4_1, it takes 3 instructions (pand, pandn and por)
> + for movdfcc/movsfcc, and could possibly fail cost comparison.
> + Increase branch cost will hurt performance for other modes, so
> + specially add some preference for floating point ifcvt.  */
> +  if (!TARGET_SSE4_1 && if_info->x
> +  && GET_MODE_CLASS (GET_MODE (if_info->x)) == MODE_FLOAT
> +  && if_info->speed_p)
> +{
> +  unsigned cost = seq_cost (seq, true);
> +
> +  if (cost <= if_info->original_cost)
> +   return true;
> +
> +  return cost <= (if_info->max_seq_cost + COSTS_N_INSNS (2));
> +}
> +
>return default_noce_conversion_profitable_p (seq, if_info);
>  }
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr115299.c 
> b/gcc/testsuite/gcc.target/i386/pr115299.c
> new file mode 100644
> index 000..53c5899136a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr115299.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-O2 -mno-sse4.1 -msse2" } */
> +
> +void f(double*d,double*e){
> +  for(;d +*d=(*d<.5)?.7:0;
> +}
> +
> +/* { dg-final { scan-assembler {(?n)(?:cmpnltsd|cmpltsd)} } } */
> +/* { dg-final { scan-assembler {(?n)(?:andnpd|andpd)} } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pr86722.c 
> b/gcc/testsuite/gcc.target/i386/pr86722.c
> index 4de2ca1a6c0..e266a1e56c2 100644
> --- a/gcc/testsuite/gcc.target/i386/pr86722.c
> +++ b/gcc/testsuite/gcc.target/i386/pr86722.c
> @@ -6,5 +6,5 @@ void f(double*d,double*e){
>  *d=(*d<.5)?.7:0;
>  }
>
> -/* { dg-final { scan-assembler-not "andnpd" } } */
> +/* { dg-final { scan-assembler-times {(?n)(?:andnpd|andpd)} 1 } } */
>  /* { dg-final { scan-assembler-not "orpd" } } */
> --
> 2.31.1
>


Re: [PATCH 39/52] i386: New hook implementation ix86_c_mode_for_floating_type

2024-06-03 Thread Uros Bizjak
On Mon, Jun 3, 2024 at 5:02 AM Kewen Lin  wrote:
>
> This is to remove macros {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE
> defines in i386 port, and add new port specific hook
> implementation ix86_c_mode_for_floating_type.
>
> gcc/ChangeLog:
>
> * config/i386/i386.cc (ix86_c_mode_for_floating_type): New
> function.
> (TARGET_C_MODE_FOR_FLOATING_TYPE): New macro.
> * config/i386/i386.h (FLOAT_TYPE_SIZE): Remove.
> (DOUBLE_TYPE_SIZE): Likewise.
> (LONG_DOUBLE_TYPE_SIZE): Likewise.

OK.

Thanks,
Uros.

> ---
>  gcc/config/i386/i386.cc | 15 +++
>  gcc/config/i386/i386.h  |  4 
>  2 files changed, 15 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index 3e2a3a194f1..6abb6d7a1ca 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -25794,6 +25794,19 @@ ix86_bitint_type_info (int n, struct bitint_info 
> *info)
>return true;
>  }
>
> +/* Implement TARGET_C_MODE_FOR_FLOATING_TYPE.  Return DFmode, TFmode
> +   or XFmode for TI_LONG_DOUBLE_TYPE which is for long double type,
> +   based on long double bits, go with the default one for the others.  */
> +
> +static machine_mode
> +ix86_c_mode_for_floating_type (enum tree_index ti)
> +{
> +  if (ti == TI_LONG_DOUBLE_TYPE)
> +return (TARGET_LONG_DOUBLE_64 ? DFmode
> + : (TARGET_LONG_DOUBLE_128 ? TFmode : 
> XFmode));
> +  return default_mode_for_floating_type (ti);
> +}
> +
>  /* Returns modified FUNCTION_TYPE for cdtor callabi.  */
>  tree
>  ix86_cxx_adjust_cdtor_callabi_fntype (tree fntype)
> @@ -26419,6 +26432,8 @@ static const scoped_attribute_specs *const 
> ix86_attribute_table[] =
>  #define TARGET_C_EXCESS_PRECISION ix86_get_excess_precision
>  #undef TARGET_C_BITINT_TYPE_INFO
>  #define TARGET_C_BITINT_TYPE_INFO ix86_bitint_type_info
> +#undef TARGET_C_MODE_FOR_FLOATING_TYPE
> +#define TARGET_C_MODE_FOR_FLOATING_TYPE ix86_c_mode_for_floating_type
>  #undef TARGET_CXX_ADJUST_CDTOR_CALLABI_FNTYPE
>  #define TARGET_CXX_ADJUST_CDTOR_CALLABI_FNTYPE 
> ix86_cxx_adjust_cdtor_callabi_fntype
>  #undef TARGET_PROMOTE_PROTOTYPES
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> index 359a8408263..fad434c10d6 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -675,10 +675,6 @@ extern const char *host_detect_local_cpu (int argc, 
> const char **argv);
>  #define LONG_TYPE_SIZE (TARGET_X32 ? 32 : BITS_PER_WORD)
>  #define POINTER_SIZE (TARGET_X32 ? 32 : BITS_PER_WORD)
>  #define LONG_LONG_TYPE_SIZE 64
> -#define FLOAT_TYPE_SIZE 32
> -#define DOUBLE_TYPE_SIZE 64
> -#define LONG_DOUBLE_TYPE_SIZE \
> -  (TARGET_LONG_DOUBLE_64 ? 64 : (TARGET_LONG_DOUBLE_128 ? 128 : 80))
>
>  #define WIDEST_HARDWARE_FP_SIZE 80
>
> --
> 2.43.0
>


[committed] alpha: Fix invalid RTX in divmodsi insn patterns [PR115297]

2024-05-31 Thread Uros Bizjak
any_divmod instructions are modelled with invalid RTX:

  [(set (match_operand:DI 0 "register_operand" "=c")
(sign_extend:DI (match_operator:SI 3 "divmod_operator"
[(match_operand:DI 1 "register_operand" "a")
 (match_operand:DI 2 "register_operand" "b")])))
   (clobber (reg:DI 23))
   (clobber (reg:DI 28))]

where SImode divmod_operator (div,mod,udiv,umod) has DImode operands.

Wrap input operand with truncate:SI to make machine modes consistent.

PR target/115297

gcc/ChangeLog:

* config/alpha/alpha.md (si3): Wrap DImode
operands 3 and 4 with truncate:SI RTX.
(*divmodsi_internal_er): Ditto for operands 1 and 2.
(*divmodsi_internal_er_1): Ditto.
(*divmodsi_internal): Ditto.
* config/alpha/constraints.md ("b"): Correct register
number in the description.

gcc/testsuite/ChangeLog:

* gcc.target/alpha/pr115297.c: New test.

Tested by building an alpha-linux-gnu crosscompiler.

Uros.
diff --git a/gcc/config/alpha/alpha.md b/gcc/config/alpha/alpha.md
index 79f12c53c16..1e2de5a4d15 100644
--- a/gcc/config/alpha/alpha.md
+++ b/gcc/config/alpha/alpha.md
@@ -725,7 +725,8 @@ (define_expand "si3"
(sign_extend:DI (match_operand:SI 2 "nonimmediate_operand")))
(parallel [(set (match_dup 5)
   (sign_extend:DI
-   (any_divmod:SI (match_dup 3) (match_dup 4
+   (any_divmod:SI (truncate:SI (match_dup 3))
+  (truncate:SI (match_dup 4)
  (clobber (reg:DI 23))
  (clobber (reg:DI 28))])
(set (match_operand:SI 0 "nonimmediate_operand")
@@ -751,9 +752,10 @@ (define_expand "di3"
 
 (define_insn_and_split "*divmodsi_internal_er"
   [(set (match_operand:DI 0 "register_operand" "=c")
-   (sign_extend:DI (match_operator:SI 3 "divmod_operator"
-   [(match_operand:DI 1 "register_operand" "a")
-(match_operand:DI 2 "register_operand" "b")])))
+   (sign_extend:DI
+(match_operator:SI 3 "divmod_operator"
+ [(truncate:SI (match_operand:DI 1 "register_operand" "a"))
+  (truncate:SI (match_operand:DI 2 "register_operand" "b"))])))
(clobber (reg:DI 23))
(clobber (reg:DI 28))]
   "TARGET_EXPLICIT_RELOCS && TARGET_ABI_OSF"
@@ -795,8 +797,8 @@ (define_insn_and_split "*divmodsi_internal_er"
 (define_insn "*divmodsi_internal_er_1"
   [(set (match_operand:DI 0 "register_operand" "=c")
(sign_extend:DI (match_operator:SI 3 "divmod_operator"
-[(match_operand:DI 1 "register_operand" "a")
- (match_operand:DI 2 "register_operand" "b")])))
+[(truncate:SI (match_operand:DI 1 "register_operand" "a"))
+ (truncate:SI (match_operand:DI 2 "register_operand" "b"))])))
(use (match_operand:DI 4 "register_operand" "c"))
(use (match_operand 5 "const_int_operand"))
(clobber (reg:DI 23))
@@ -808,9 +810,10 @@ (define_insn "*divmodsi_internal_er_1"
 
 (define_insn "*divmodsi_internal"
   [(set (match_operand:DI 0 "register_operand" "=c")
-   (sign_extend:DI (match_operator:SI 3 "divmod_operator"
-   [(match_operand:DI 1 "register_operand" "a")
-(match_operand:DI 2 "register_operand" "b")])))
+   (sign_extend:DI
+(match_operator:SI 3 "divmod_operator"
+ [(truncate:SI (match_operand:DI 1 "register_operand" "a"))
+  (truncate:SI (match_operand:DI 2 "register_operand" "b"))])))
(clobber (reg:DI 23))
(clobber (reg:DI 28))]
   "TARGET_ABI_OSF"
diff --git a/gcc/config/alpha/constraints.md b/gcc/config/alpha/constraints.md
index 0d001ba26f1..4383f1fa895 100644
--- a/gcc/config/alpha/constraints.md
+++ b/gcc/config/alpha/constraints.md
@@ -27,7 +27,7 @@ (define_register_constraint "a" "R24_REG"
  "General register 24, input to division routine")
 
 (define_register_constraint "b" "R25_REG"
- "General register 24, input to division routine")
+ "General register 25, input to division routine")
 
 (define_register_constraint "c" "R27_REG"
  "General register 27, function call address")
diff --git a/gcc/testsuite/gcc.target/alpha/pr115297.c 
b/gcc/testsuite/gcc.target/alpha/pr115297.c
new file mode 100644
index 000..4d5890ec8d9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/alpha/pr115297.c
@@ -0,0 +1,13 @@
+/* PR target/115297 */
+/* { dg-do compile } */
+/* { dg-options "-O1" } */
+
+enum { BPF_F_USER_BUILD_ID } __bpf_get_stack_size;
+long __bpf_get_stack_flags, bpf_get_stack___trans_tmp_2;
+
+void bpf_get_stack() {
+  unsigned elem_size;
+  int err = elem_size = __bpf_get_stack_flags ?: sizeof(long);
+  if (__builtin_expect(__bpf_get_stack_size % elem_size, 0))
+bpf_get_stack___trans_tmp_2 = err;
+}


[committed] i386: Rewrite bswaphi2 handling [PR115102]

2024-05-30 Thread Uros Bizjak
Introduce *bswaphi2 instruction pattern and enable bswaphi2 expander
also for non-movbe targets.  The testcase:

unsigned short bswap8 (unsigned short val)
{
  return ((val & 0xff00) >> 8) | ((val & 0xff) << 8);
}

now expands through bswaphi2 named expander.

Rewrite bswaphi_lowpart insn pattern as bswaphisi2_lowpart in the RTX form
that combine pass can use to simplify:

Trying 6, 9, 8 -> 10:
6: r99:SI=bswap(r103:SI)
9: {r107:SI=r103:SI&0x;clobber flags:CC;}
  REG_DEAD r103:SI
  REG_UNUSED flags:CC
8: {r106:SI=r99:SI 0>>0x10;clobber flags:CC;}
  REG_DEAD r99:SI
  REG_UNUSED flags:CC
   10: {r104:SI=r106:SI|r107:SI;clobber flags:CC;}
  REG_DEAD r107:SI
  REG_DEAD r106:SI
  REG_UNUSED flags:CC

Successfully matched this instruction:
(set (reg:SI 104 [ _8 ])
(ior:SI (and:SI (reg/v:SI 103 [ val ])
(const_int -65536 [0x]))
(lshiftrt:SI (bswap:SI (reg/v:SI 103 [ val ]))
(const_int 16 [0x10]
allowing combination of insns 6, 8, 9 and 10

when compiling the following testcase:

unsigned int bswap8 (unsigned int val)
{
  return (val & 0x) | ((val & 0xff00) >> 8) | ((val & 0xff) << 8);
}

to produce:

movl%edi, %eax
xchgb   %ah, %al
ret

The expansion now always goes through a clobberless form of the bswaphi
instruction.  The instruction is conditionally converted to a rotate at
peephole2 pass.  This significantly simplifies bswaphisi2_lowpart
insn pattern attributes.

PR target/115102

gcc/ChangeLog:

* config/i386/i386.md (bswaphi2): Also enable for !TARGET_MOVBE.
(*bswaphi2): New insn pattern.
(bswaphisi2_lowpart): Rename from bswaphi_lowpart.  Rewrite
insn RTX to match the expected form of the combine pass.
Remove rol{w} alternative and corresponding attributes.
(bswsaphisi2_lowpart peephole2): New peephole2 pattern to
conditionally convert bswaphisi2_lowpart to rotlhi3_1_slp.
(bswapsi2): Update expander for rename.
(rotlhi3_1_slp splitter): Conditionally split to bswaphi2.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr115102.c: New test.

Bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index c162cd42386..375654cf74e 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -17210,9 +17210,7 @@ (define_split
   (clobber (reg:CC FLAGS_REG))]
  "reload_completed
   && (TARGET_USE_XCHGB || optimize_function_for_size_p (cfun))"
- [(parallel [(set (strict_low_part (match_dup 0))
- (bswap:HI (match_dup 0)))
-(clobber (reg:CC FLAGS_REG))])])
+ [(set (match_dup 0) (bswap:HI (match_dup 0)))])
 
 ;; Rotations through carry flag
 (define_insn "rcrsi2"
@@ -20730,12 +20728,11 @@ (define_expand "bswapsi2"
 operands[1] = force_reg (SImode, operands[1]);
   else
 {
-  rtx x = operands[0];
+  rtx x = gen_reg_rtx (SImode);
 
-  emit_move_insn (x, operands[1]);
-  emit_insn (gen_bswaphi_lowpart (gen_lowpart (HImode, x)));
+  emit_insn (gen_bswaphisi2_lowpart (x, operands[1]));
   emit_insn (gen_rotlsi3 (x, x, GEN_INT (16)));
-  emit_insn (gen_bswaphi_lowpart (gen_lowpart (HImode, x)));
+  emit_insn (gen_bswaphisi2_lowpart (operands[0], x));
   DONE;
 }
 })
@@ -20767,7 +20764,11 @@ (define_insn "*bswap2"
 (define_expand "bswaphi2"
   [(set (match_operand:HI 0 "register_operand")
(bswap:HI (match_operand:HI 1 "nonimmediate_operand")))]
-  "TARGET_MOVBE")
+  ""
+{
+  if (!TARGET_MOVBE)
+operands[1] = force_reg (HImode, operands[1]);
+})
 
 (define_insn "*bswaphi2_movbe"
   [(set (match_operand:HI 0 "nonimmediate_operand" "=Q,r,m")
@@ -20788,33 +20789,55 @@ (define_insn "*bswaphi2_movbe"
(set_attr "bdver1_decode" "double,*,*")
(set_attr "mode" "QI,HI,HI")])
 
+(define_insn "*bswaphi2"
+  [(set (match_operand:HI 0 "register_operand" "=Q")
+   (bswap:HI (match_operand:HI 1 "register_operand" "0")))]
+  "!TARGET_MOVBE"
+  "xchg{b}\t{%h0, %b0|%b0, %h0}"
+  [(set_attr "type" "imov")
+   (set_attr "pent_pair" "np")
+   (set_attr "athlon_decode" "vector")
+   (set_attr "amdfam10_decode" "double")
+   (set_attr "bdver1_decode" "double")
+   (set_attr "mode" "QI")])
+
 (define_peephole2
   [(set (match_operand:HI 0 "general_reg_operand")
(bswap:HI (match_dup 0)))]
-  "TARGET_MOVBE
-   && !(TARGET_USE_XCHGB || optimize_function_for_size_p (cfun))
+  "!(TARGET_USE_XCHGB ||
+ TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun))
&& peep2_regno_dead_p (0, FLAGS_REG)"
   [(parallel [(set (match_dup 0) (rotate:HI (match_dup 0) (const_int 8)))
  (clobber (reg:CC FLAGS_REG))])])
 
-(define_insn "bswaphi_lowpart"
-  [(set (strict_low_part (match_operand:HI 0 "register_operand" "+Q,r"))
-   (bswap:HI (match_dup 0)))
-   (clobber (reg:CC FLAGS_REG))]
+(define_insn "bswaphisi2_lowpart"
+  [(set (match_operand:SI 0 "register_operand" 

[committed] i386: Improve access to _Atomic DImode location via XMM regs for SSE4.1 x86_32 targets

2024-05-28 Thread Uros Bizjak
Use MOVD/PEXTRD and MOVD/PINSRD insn sequences to move DImode value
between XMM and GPR register sets for SSE4.1 x86_32 targets in order
to avoid spilling the value to stack.

The load from _Atomic location a improves from:

movqa, %xmm0
movq%xmm0, (%esp)
movl(%esp), %eax
movl4(%esp), %edx

to:
movqa, %xmm0
movd%xmm0, %eax
pextrd  $1, %xmm0, %edx

The store to _Atomic location b improves from:

movl%eax, (%esp)
movl%edx, 4(%esp)
movq(%esp), %xmm0
movq%xmm0, b

to:
movd%eax, %xmm0
pinsrd  $1, %edx, %xmm0
movq%xmm0, b

gcc/ChangeLog:

* config/i386/sync.md (atomic_loaddi_fpu): Use movd/pextrd
to move DImode value from XMM to GPR for TARGET_SSE4_1.
(atomic_storedi_fpu): Use movd/pinsrd to move DImode value
from GPR to XMM for TARGET_SSE4_1.

Bootstrapped and regression tested on x86_64-pc-linuxgnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/sync.md b/gcc/config/i386/sync.md
index 8317581ebe2..f2b3ba0aa7a 100644
--- a/gcc/config/i386/sync.md
+++ b/gcc/config/i386/sync.md
@@ -215,8 +215,18 @@ (define_insn_and_split "atomic_loaddi_fpu"
}
   else
{
+ rtx tmpdi = gen_lowpart (DImode, tmp);
+
  emit_insn (gen_loaddi_via_sse (tmp, src));
- emit_insn (gen_storedi_via_sse (mem, tmp));
+
+ if (GENERAL_REG_P (dst)
+ && TARGET_SSE4_1 && TARGET_INTER_UNIT_MOVES_FROM_VEC)
+   {
+ emit_move_insn (dst, tmpdi);
+ DONE;
+   }
+ else
+   emit_move_insn (mem, tmpdi);
}
 
   if (mem != dst)
@@ -294,20 +304,30 @@ (define_insn_and_split "atomic_storedi_fpu"
 emit_move_insn (dst, src);
   else
 {
-  if (REG_P (src))
-   {
- emit_move_insn (mem, src);
- src = mem;
-   }
-
   if (STACK_REG_P (tmp))
{
+ if (GENERAL_REG_P (src))
+   {
+ emit_move_insn (mem, src);
+ src = mem;
+   }
+
  emit_insn (gen_loaddi_via_fpu (tmp, src));
  emit_insn (gen_storedi_via_fpu (dst, tmp));
}
   else
{
- emit_insn (gen_loaddi_via_sse (tmp, src));
+ rtx tmpdi = gen_lowpart (DImode, tmp);
+
+ if (GENERAL_REG_P (src)
+ && !(TARGET_SSE4_1 && TARGET_INTER_UNIT_MOVES_TO_VEC))
+   {
+ emit_move_insn (mem, src);
+ src = mem;
+   }
+
+ emit_move_insn (tmpdi, src);
+
  emit_insn (gen_storedi_via_sse (dst, tmp));
}
 }


Re: [PATCH V2] Reduce cost of MEM (A + imm).

2024-05-28 Thread Uros Bizjak
On Tue, May 28, 2024 at 12:48 PM liuhongt  wrote:
>
> > IMO, there is no need for CONST_INT_P condition, we should also allow
> > symbol_ref, label_ref and const (all allowed by
> > x86_64_immediate_operand predicate), these all decay to an immediate
> > value.
>
> Changed.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk.
>
> For MEM, rtx_cost iterates each subrtx, and adds up the costs,
> so for MEM (reg) and MEM (reg + 4), the former costs 5,
> the latter costs 9, it is not accurate for x86. Ideally
> address_cost should be used, but it reduce cost too much.
> So current solution is make constant disp as cheap as possible.
>
> gcc/ChangeLog:
>
> PR target/67325
> * config/i386/i386.cc (ix86_rtx_costs): Reduce cost of MEM (A
> + imm) to "cost of MEM (A)" + 1.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr67325.c: New test.

OK.

Thanks,
Uros.

> ---
>  gcc/config/i386/i386.cc | 18 +-
>  gcc/testsuite/gcc.target/i386/pr67325.c |  7 +++
>  2 files changed, 24 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr67325.c
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index 3e2a3a194f1..85d87b9f778 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -22194,7 +22194,23 @@ ix86_rtx_costs (rtx x, machine_mode mode, int 
> outer_code_i, int opno,
>/* An insn that accesses memory is slightly more expensive
>   than one that does not.  */
>if (speed)
> -*total += 1;
> +   {
> + *total += 1;
> + rtx addr = XEXP (x, 0);
> + /* For MEM, rtx_cost iterates each subrtx, and adds up the costs,
> +so for MEM (reg) and MEM (reg + 4), the former costs 5,
> +the latter costs 9, it is not accurate for x86. Ideally
> +address_cost should be used, but it reduce cost too much.
> +So current solution is make constant disp as cheap as possible.  
> */
> + if (GET_CODE (addr) == PLUS
> + && x86_64_immediate_operand (XEXP (addr, 1), Pmode))
> +   {
> + *total += 1;
> + *total += rtx_cost (XEXP (addr, 0), Pmode, PLUS, 0, speed);
> + return true;
> +   }
> +   }
> +
>return false;
>
>  case ZERO_EXTRACT:
> diff --git a/gcc/testsuite/gcc.target/i386/pr67325.c 
> b/gcc/testsuite/gcc.target/i386/pr67325.c
> new file mode 100644
> index 000..c3c1e4c5b4d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr67325.c
> @@ -0,0 +1,7 @@
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-O2" } */
> +/* { dg-final { scan-assembler-not "(?:sar|shr)" } } */
> +
> +int f(long*l){
> +  return *l>>32;
> +}
> --
> 2.31.1
>


Re: [PATCH] [x86_64]: Zhaoxin shijidadao enablement

2024-05-28 Thread Uros Bizjak
On Mon, May 27, 2024 at 10:33 AM MayShao  wrote:
>
> From: mayshao 
>
> Hi all:
> This patch enables -march/-mtune=shijidadao, costs and tunings are set 
> according to the characteristics of the processor.
>
> Bootstrapped /regtested X86_64.
>
> Ok for trunk?

OK.

Thanks,
Uros.

> BR
> Mayshao
> gcc/ChangeLog:
>
> * common/config/i386/cpuinfo.h (get_zhaoxin_cpu): Recognize 
> shijidadao.
> * common/config/i386/i386-common.cc: Add shijidadao.
> * common/config/i386/i386-cpuinfo.h (enum processor_subtypes):
> Add ZHAOXIN_FAM7H_SHIJIDADAO.
> * config.gcc: Add shijidadao.
> * config/i386/driver-i386.cc (host_detect_local_cpu):
> Let -march=native recognize shijidadao processors.
> * config/i386/i386-c.cc (ix86_target_macros_internal): Add shijidadao.
> * config/i386/i386-options.cc (m_ZHAOXIN): Add m_SHIJIDADAO.
> (m_SHIJIDADAO): New definition.
> * config/i386/i386.h (enum processor_type): Add PROCESSOR_SHIJIDADAO.
> * config/i386/x86-tune-costs.h (struct processor_costs):
> Add shijidadao_cost.
> * config/i386/x86-tune-sched.cc (ix86_issue_rate): Add shijidadao.
> (ix86_adjust_cost): Ditto.
> * config/i386/x86-tune.def (X86_TUNE_USE_GATHER_2PARTS): Add 
> m_SHIJIDADAO.
> (X86_TUNE_USE_GATHER_4PARTS): Ditto.
> (X86_TUNE_USE_GATHER_8PARTS): Ditto.
> (X86_TUNE_AVOID_128FMA_CHAINS): Ditto.
> * doc/extend.texi: Add details about shijidadao.
> * doc/invoke.texi: Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * g++.target/i386/mv32.C: Handle new -march
> * gcc.target/i386/funcspec-56.inc: Ditto.
> ---
>  gcc/common/config/i386/cpuinfo.h  |   8 +-
>  gcc/common/config/i386/i386-common.cc |   8 +-
>  gcc/common/config/i386/i386-cpuinfo.h |   1 +
>  gcc/config.gcc|  14 ++-
>  gcc/config/i386/driver-i386.cc|  11 +-
>  gcc/config/i386/i386-c.cc |   7 ++
>  gcc/config/i386/i386-options.cc   |   4 +-
>  gcc/config/i386/i386.h|   1 +
>  gcc/config/i386/x86-tune-costs.h  | 116 ++
>  gcc/config/i386/x86-tune-sched.cc |   2 +
>  gcc/config/i386/x86-tune.def  |   8 +-
>  gcc/doc/extend.texi   |   3 +
>  gcc/doc/invoke.texi   |   6 +
>  gcc/testsuite/g++.target/i386/mv32.C  |   6 +
>  gcc/testsuite/gcc.target/i386/funcspec-56.inc |   2 +
>  15 files changed, 183 insertions(+), 14 deletions(-)
>
> diff --git a/gcc/common/config/i386/cpuinfo.h 
> b/gcc/common/config/i386/cpuinfo.h
> index 4610bf6d6a4..936039725ab 100644
> --- a/gcc/common/config/i386/cpuinfo.h
> +++ b/gcc/common/config/i386/cpuinfo.h
> @@ -667,12 +667,18 @@ get_zhaoxin_cpu (struct __processor_model *cpu_model,
>   reset_cpu_feature (cpu_model, cpu_features2, FEATURE_F16C);
>   cpu_model->__cpu_subtype = ZHAOXIN_FAM7H_LUJIAZUI;
> }
> - else if (model >= 0x5b)
> + else if (model == 0x5b)
> {
>   cpu = "yongfeng";
>   CHECK___builtin_cpu_is ("yongfeng");
>   cpu_model->__cpu_subtype = ZHAOXIN_FAM7H_YONGFENG;
> }
> + else if (model >= 0x6b)
> +   {
> + cpu = "shijidadao";
> + CHECK___builtin_cpu_is ("shijidadao");
> + cpu_model->__cpu_subtype = ZHAOXIN_FAM7H_SHIJIDADAO;
> +   }
>break;
>  default:
>break;
> diff --git a/gcc/common/config/i386/i386-common.cc 
> b/gcc/common/config/i386/i386-common.cc
> index 895e5fa662d..eb3f94c529c 100644
> --- a/gcc/common/config/i386/i386-common.cc
> +++ b/gcc/common/config/i386/i386-common.cc
> @@ -2066,6 +2066,7 @@ const char *const processor_names[] =
>"intel",
>"lujiazui",
>"yongfeng",
> +  "shijidadao",
>"geode",
>"k6",
>"athlon",
> @@ -2271,10 +2272,13 @@ const pta processor_alias_table[] =
>| PTA_SSSE3 | PTA_SSE4_1 | PTA_FXSR, 0, P_NONE},
>{"lujiazui", PROCESSOR_LUJIAZUI, CPU_LUJIAZUI,
> PTA_LUJIAZUI,
> -   M_CPU_SUBTYPE (ZHAOXIN_FAM7H_LUJIAZUI), P_NONE},
> +   M_CPU_SUBTYPE (ZHAOXIN_FAM7H_LUJIAZUI), P_PROC_BMI},
>{"yongfeng", PROCESSOR_YONGFENG, CPU_YONGFENG,
> PTA_YONGFENG,
> -   M_CPU_SUBTYPE (ZHAOXIN_FAM7H_YONGFENG), P_NONE},
> +   M_CPU_SUBTYPE (ZHAOXIN_FAM7H_YONGFENG), P_PROC_AVX2},
> +  {"shijidadao", PROCESSOR_SHIJIDADAO, CPU_YONGFENG,
> +   PTA_YONGFENG,
> +   M_CPU_SUBTYPE (ZHAOXIN_FAM7H_SHIJIDADAO), P_PROC_AVX2},
>{"k8", PROCESSOR_K8, CPU_K8,
>  PTA_64BIT | PTA_MMX | PTA_3DNOW | PTA_3DNOW_A | PTA_SSE
>| PTA_SSE2 | PTA_NO_SAHF | PTA_FXSR, 0, P_NONE},
> diff --git a/gcc/common/config/i386/i386-cpuinfo.h 
> b/gcc/common/config/i386/i386-cpuinfo.h
> index 9edad96d4fd..fa3b76f4931 100644
> --- a/gcc/common/config/i386/i386-cpuinfo.h
> +++ 

Re: [PATCH] Reduce cost of MEM (A + imm).

2024-05-28 Thread Uros Bizjak
On Tue, May 28, 2024 at 4:48 AM liuhongt  wrote:
>
> For MEM, rtx_cost iterates each subrtx, and adds up the costs,
> so for MEM (reg) and MEM (reg + 4), the former costs 5,
> the latter costs 9, it is not accurate for x86. Ideally
> address_cost should be used, but it reduce cost too much.
> So current solution is make constant disp as cheap as possible.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?
>
> gcc/ChangeLog:
>
> PR target/67325
> * config/i386/i386.cc (ix86_rtx_costs): Reduce cost of MEM (A
> + imm) to "cost of MEM (A)" + 1.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr67325.c: New test.
> ---
>  gcc/config/i386/i386.cc | 19 ++-
>  gcc/testsuite/gcc.target/i386/pr67325.c |  7 +++
>  2 files changed, 25 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr67325.c
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index 3e2a3a194f1..3936223bd20 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -22194,7 +22194,24 @@ ix86_rtx_costs (rtx x, machine_mode mode, int 
> outer_code_i, int opno,
>/* An insn that accesses memory is slightly more expensive
>   than one that does not.  */
>if (speed)
> -*total += 1;
> +   {
> + *total += 1;
> + rtx addr = XEXP (x, 0);
> + /* For MEM, rtx_cost iterates each subrtx, and adds up the costs,
> +so for MEM (reg) and MEM (reg + 4), the former costs 5,
> +the latter costs 9, it is not accurate for x86. Ideally
> +address_cost should be used, but it reduce cost too much.
> +So current solution is make constant disp as cheap as possible.  
> */
> + if (GET_CODE (addr) == PLUS
> + && CONST_INT_P (XEXP (addr, 1))

IMO, there is no need for CONST_INT_P condition, we should also allow
symbol_ref, label_ref and const (all allowed by
x86_64_immediate_operand predicate), these all decay to an immediate
value.

Uros.

> + && x86_64_immediate_operand (XEXP (addr, 1), Pmode))
> +   {
> + *total += 1;
> + *total += rtx_cost (XEXP (addr, 0), Pmode, PLUS, 0, speed);
> + return true;
> +   }
> +   }
> +
>return false;
>
>  case ZERO_EXTRACT:
> diff --git a/gcc/testsuite/gcc.target/i386/pr67325.c 
> b/gcc/testsuite/gcc.target/i386/pr67325.c
> new file mode 100644
> index 000..c3c1e4c5b4d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr67325.c
> @@ -0,0 +1,7 @@
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-O2" } */
> +/* { dg-final { scan-assembler-not "(?:sar|shr)" } } */
> +
> +int f(long*l){
> +  return *l>>32;
> +}
> --
> 2.31.1
>


Re: [PATCH v1 2/6] Extract ix86 dllimport implementation to mingw

2024-05-23 Thread Uros Bizjak
On Thu, May 23, 2024 at 7:53 PM Evgeny Karpov
 wrote:
>
>
> Thursday, May 23, 2024 10:35 AM
> Uros Bizjak  wrote:
>
> > Richard Sandiford  wrote:
> > >
> > > > This looks good to me apart from a couple of very minor comments
> > > > below, but please get approval from the x86 maintainers as well.  In
> > > > particular, they might prefer to handle ix86_legitimize_pe_coff_symbol 
> > > > in
> > some other way.
> > >
> > > Jan and Uros, could you please review x86 refactoring for mingw part?
> >
> > Yes, perhaps legitimize_pe_coff_symbol should be handled similar to how
> > machopic_legitimize_pic_address is handled.and just use "#if TARGET_PECOFF"
> > at call sites when calling functions from the new winnt-dll.h. This would 
> > also
> > allow us to remove  the early check for !TARGET_PECOFF in
> > legitimize_pe_coff_symbol.
> >
> > Uros.
>
>
> The function legitimize_pe_coff_symbol is now part of mingw and will not be 
> used for linux targets.
> This is why ix86_legitimize_pe_coff_symbol has been introduced, to be 
> available for all platforms.

There is no need for a ix86_legitimize_pe_coff_symbol. This function
is now defined in a header that is not included by default, so the
call sites should use #if TARGET_PECOFF to isolate its use. Please see
how "#if TARGET_MACHO" is used in config/i386/* files for the similar
issue. I think that TARGET_PECOFF should follow this example.

Uros.


Re: [PATCH v1 2/6] Extract ix86 dllimport implementation to mingw

2024-05-23 Thread Uros Bizjak
On Thu, May 23, 2024 at 10:35 AM Uros Bizjak  wrote:
>
> On Wed, May 22, 2024 at 4:32 PM Evgeny Karpov
>  wrote:
> >
> > Wednesday, May 22, 2024 1:06 PM
> > Richard Sandiford  wrote:
> >
> > > This looks good to me apart from a couple of very minor comments below, 
> > > but
> > > please get approval from the x86 maintainers as well.  In particular, 
> > > they might
> > > prefer to handle ix86_legitimize_pe_coff_symbol in some other way.
> >
> > Thanks, Richard, for the review!
> > The suggestions will be addressed in the next version.
> >
> > Jan and Uros, could you please review x86 refactoring for mingw part? 
> > Thanks.
>
> Yes, perhaps legitimize_pe_coff_symbol should be handled similar to
> how machopic_legitimize_pic_address is handled.and just use "#if
> TARGET_PECOFF" at call sites when calling functions from the new
> winnt-dll.h. This would also allow us to remove  the early check for
> !TARGET_PECOFF in legitimize_pe_coff_symbol.

Maybe you should look how TARGET_MACHO is handled in config/i386/* files.

Uros.


Re: [PATCH v1 2/6] Extract ix86 dllimport implementation to mingw

2024-05-23 Thread Uros Bizjak
On Wed, May 22, 2024 at 4:32 PM Evgeny Karpov
 wrote:
>
> Wednesday, May 22, 2024 1:06 PM
> Richard Sandiford  wrote:
>
> > This looks good to me apart from a couple of very minor comments below, but
> > please get approval from the x86 maintainers as well.  In particular, they 
> > might
> > prefer to handle ix86_legitimize_pe_coff_symbol in some other way.
>
> Thanks, Richard, for the review!
> The suggestions will be addressed in the next version.
>
> Jan and Uros, could you please review x86 refactoring for mingw part? Thanks.

Yes, perhaps legitimize_pe_coff_symbol should be handled similar to
how machopic_legitimize_pic_address is handled.and just use "#if
TARGET_PECOFF" at call sites when calling functions from the new
winnt-dll.h. This would also allow us to remove  the early check for
!TARGET_PECOFF in legitimize_pe_coff_symbol.

Uros.


Re: [x86_64 PATCH] Correct insn_cost of movabsq.

2024-05-22 Thread Uros Bizjak
On Wed, May 22, 2024 at 5:15 PM Roger Sayle  wrote:
>
> This single line patch fixes a strange quirk/glitch in i386's rtx_costs,
> which considers an instruction loading a 64-bit constant to be significantly
> cheaper than loading a 32-bit (or smaller) constant.
>
> Consider the two functions:
> unsigned long long foo() { return 0x0123456789abcdefULL; }
> unsigned int bar() { return 10; }
>
> and the corresponding lines from combine's dump file:
>   insn_cost 1 for #: r98:DI=0x123456789abcdef
>   insn_cost 4 for #: ax:SI=0xa
>
> The same issue can be seen in -dP assembler output.
>   movabsq $81985529216486895, %rax# 5  [c=1 l=10]  *movdi_internal/4
>
> The problem is that pattern_costs interpretation of rtx_costs contains
> "return cost > 0 ? cost : COSTS_N_INSNS (1)" where a zero value (for
> example a register or small immediate constant) is considered special,
> and equivalent to a single instruction, but all other values are treated
> as verbatim.  Hence to make x86_64's 10-byte long movabsq instruction
> slightly more expensive than a simple constant, rtx_costs needs to
> return COSTS_N_INSNS(1)+1 and not 1.  With this change, the insn_cost
> of movabsq is the intended value 5:
>   insn_cost 5 for #: r98:DI=0x123456789abcdef
> and
>   movabsq $81985529216486895, %rax# 5  [c=5 l=10]  *movdi_internal/4
>
>
> [I'd originally tried fixing this by adding a ix86_insn_cost target
> hook, but the testsuite is very sensitive to the costing of insns].
>
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
>
> 2024-05-22  Roger Sayle  
>
> gcc/ChangeLog
> * config/i386/i386.cc (ix86_rtx_costs) :
> A CONST_INT that isn't x86_64_immediate_operand requires an extra
> (expensive) movabsq insn to load, so return COSTS_N_INSNS (1) + 1.

1 of 20,796

[x86_64 PATCH] Correct insn_cost of movabsq.

Inbox

Roger Sayle

5:15 PM (12 minutes ago)


to gcc-patches, me
This single line patch fixes a strange quirk/glitch in i386's rtx_costs,
which considers an instruction loading a 64-bit constant to be significantly
cheaper than loading a 32-bit (or smaller) constant.

Consider the two functions:
unsigned long long foo() { return 0x0123456789abcdefULL; }
unsigned int bar() { return 10; }

and the corresponding lines from combine's dump file:
  insn_cost 1 for #: r98:DI=0x123456789abcdef
  insn_cost 4 for #: ax:SI=0xa

The same issue can be seen in -dP assembler output.
  movabsq $81985529216486895, %rax# 5  [c=1 l=10]  *movdi_internal/4

The problem is that pattern_costs interpretation of rtx_costs contains
"return cost > 0 ? cost : COSTS_N_INSNS (1)" where a zero value (for
example a register or small immediate constant) is considered special,
and equivalent to a single instruction, but all other values are treated
as verbatim.  Hence to make x86_64's 10-byte long movabsq instruction
slightly more expensive than a simple constant, rtx_costs needs to
return COSTS_N_INSNS(1)+1 and not 1.  With this change, the insn_cost
of movabsq is the intended value 5:
  insn_cost 5 for #: r98:DI=0x123456789abcdef
and
  movabsq $81985529216486895, %rax# 5  [c=5 l=10]  *movdi_internal/4


[I'd originally tried fixing this by adding a ix86_insn_cost target
hook, but the testsuite is very sensitive to the costing of insns].


This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2024-05-22  Roger Sayle  

gcc/ChangeLog
* config/i386/i386.cc (ix86_rtx_costs) :
A CONST_INT that isn't x86_64_immediate_operand requires an extra
(expensive) movabsq insn to load, so return COSTS_N_INSNS (1) + 1.


Thanks in advance,
Roger
--


One attachment • Scanned by Gmail


Roger Sayle (nextmovesoftware.com), gcc-patches@gcc.gnu.org


On Wed, May 22, 2024 at 5:15 PM Roger Sayle  wrote:
>
> This single line patch fixes a strange quirk/glitch in i386's rtx_costs,
> which considers an instruction loading a 64-bit constant to be significantly
> cheaper than loading a 32-bit (or smaller) constant.
>
> Consider the two functions:
> unsigned long long foo() { return 0x0123456789abcdefULL; }
> unsigned int bar() { return 10; }
>
> and the corresponding lines from combine's dump file:
>   insn_cost 1 for #: r98:DI=0x123456789abcdef
>   insn_cost 4 for #: ax:SI=0xa
>
> The same issue can be seen in -dP assembler output.
>   movabsq $81985529216486895, %rax# 5  [c=1 l=10]  *movdi_internal/4
>
> The problem is that pattern_costs interpretation of rtx_costs contains
> "return cost > 0 ? cost : COSTS_N_INSNS (1)" where a zero value (for
> example a register or small immediate constant) is considered special,
> and equivalent to a single instruction, but all other values are treated
> as verbatim.  Hence to make x86_64's 

Re: [PATCH v2 1/8] [APX NF]: Support APX NF add

2024-05-22 Thread Uros Bizjak
On Wed, May 22, 2024 at 10:29 AM Kong, Lingling  wrote:
>
> > I wonder if we can use "define_subst" to conditionally add flags clobber
> > for !TARGET_APX_NF targets. Even the example for "Define Subst" uses the 
> > insn
> > w/ and w/o the clobber, so I think it is worth considering this approach.
> >
> > Uros.
>
> Good Suggestion, I defined new subst for no flags, and Bootstrapped and 
> regtested on x86_64-linux-gnu. Also supported SPEC 2017 run normally on Intel 
> software development emulator.
> Ok for trunk?
>
> Thanks,
> Lingling
>
> Subject: [PATCH v2 1/8] [APX NF]: Support APX NF add
> APX NF(no flags) feature implements suppresses the update of status flags
> for arithmetic operations.
>
> For NF add, it is not clear whether nf add can be faster than lea. If so,
> the pattern needs to be adjusted to perfer lea generation.
>
> gcc/ChangeLog:
>
> * config/i386/i386-opts.h (enum apx_features): Add nf
> enumeration.
> * config/i386/i386.h (TARGET_APX_NF): New.
> * config/i386/i386.md (nf_subst): New define_subst.
> (nf_name): New subst_attr.
> (nf_prefix): Ditto.
> (nf_condition): Ditto.
> (nf_mem_constraint): Ditto.
> (nf_applied): Ditto.
> (*add_1_nf): New define_insn.
> (addhi_1_nf): Ditto.
> (addqi_1_nf): Ditto.
> * config/i386/i386.opt: Add apx_nf enumeration.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/apx-ndd.c: Fixed test.
> * gcc.target/i386/apx-nf.c: New test.

LGTM, but I'll leave the final approval to Hongtao.

Thanks,
Uros.

>
> Co-authored-by: Lingling Kong 
> ---
>  gcc/config/i386/i386-opts.h |   3 +-
>  gcc/config/i386/i386.h  |   1 +
>  gcc/config/i386/i386.md | 179 +++-
>  gcc/config/i386/i386.opt|   3 +
>  gcc/testsuite/gcc.target/i386/apx-ndd.c |   2 +-
>  gcc/testsuite/gcc.target/i386/apx-nf.c  |   6 +
>  6 files changed, 126 insertions(+), 68 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/apx-nf.c
>
> diff --git a/gcc/config/i386/i386-opts.h b/gcc/config/i386/i386-opts.h
> index ef2825803b3..60176ce609f 100644
> --- a/gcc/config/i386/i386-opts.h
> +++ b/gcc/config/i386/i386-opts.h
> @@ -140,7 +140,8 @@ enum apx_features {
>apx_push2pop2 = 1 << 1,
>apx_ndd = 1 << 2,
>apx_ppx = 1 << 3,
> -  apx_all = apx_egpr | apx_push2pop2 | apx_ndd | apx_ppx,
> +  apx_nf = 1<< 4,
> +  apx_all = apx_egpr | apx_push2pop2 | apx_ndd | apx_ppx | apx_nf,
>  };
>
>  #endif
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> index 529edff93a4..f20ae4726da 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -55,6 +55,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
> If not, see
>  #define TARGET_APX_PUSH2POP2 (ix86_apx_features & apx_push2pop2)
>  #define TARGET_APX_NDD (ix86_apx_features & apx_ndd)
>  #define TARGET_APX_PPX (ix86_apx_features & apx_ppx)
> +#define TARGET_APX_NF (ix86_apx_features & apx_nf)
>
>  #include "config/vxworks-dummy.h"
>
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index 764bfe20ff2..bae344518bd 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -6233,28 +6233,6 @@
>  }
>  })
>
>
> -;; Load effective address instructions
> -
> -(define_insn "*lea"
> -  [(set (match_operand:SWI48 0 "register_operand" "=r")
> -   (match_operand:SWI48 1 "address_no_seg_operand" "Ts"))]
> -  "ix86_hardreg_mov_ok (operands[0], operands[1])"
> -{
> -  if (SImode_address_operand (operands[1], VOIDmode))
> -{
> -  gcc_assert (TARGET_64BIT);
> -  return "lea{l}\t{%E1, %k0|%k0, %E1}";
> -}
> -  else
> -return "lea{}\t{%E1, %0|%0, %E1}";
> -}
> -  [(set_attr "type" "lea")
> -   (set (attr "mode")
> - (if_then_else
> -   (match_operand 1 "SImode_address_operand")
> -   (const_string "SI")
> -   (const_string "")))])
> -
>  (define_peephole2
>[(set (match_operand:SWI48 0 "register_operand")
> (match_operand:SWI48 1 "address_no_seg_operand"))]
> @@ -6290,6 +6268,13 @@
>[(parallel [(set (match_dup 0) (ashift:SWI48 (match_dup 0) (match_dup 1)))
>(clobber (reg:CC FLAGS_REG))])]
>"operands[1] = GEN_INT (exact_log2 (INTVAL (operands[1])));")
> +
> +(define_split
> +  [(set (match_operand:SWI48 0 "general_reg_operand")
> +   (mult:SWI48 (match_dup 0) (match_operand:SWI48 1 
> "const1248_operand")))]
> +  "TARGET_APX_NF && reload_completed"
> +  [(set (match_dup 0) (ashift:SWI48 (match_dup 0) (match_dup 1)))]
> +  "operands[1] = GEN_INT (exact_log2 (INTVAL (operands[1])));")
>
>
>  ;; Add instructions
>
> @@ -6437,48 +6422,65 @@
>   (clobber (reg:CC FLAGS_REG))])]
>   "split_double_mode (mode, [0], 1, [0], 
> [5]);")
>
> -(define_insn "*add_1"
> -  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r,r,r,r,r")
> +(define_subst_attr "nf_name" "nf_subst" "_nf" "")
> 

Re: [PATCH v3] i386: Disable ix86_expand_vecop_qihi2 when !TARGET_AVX512BW

2024-05-21 Thread Uros Bizjak
On Tue, May 21, 2024 at 11:01 AM Haochen Jiang  wrote:
>
> Hi all,
>
> This is the v3 patch to fix PR115069. The new testcase has passed.
>
> Changes in v3:
>   - Simplify the testcase.
>
> Changes in v2:
>   - Add a testcase.
>   - Change the comment for the early exit.
>
> Thx,
> Haochen
>
> Since vpermq is really slow, we should avoid using it for permutation
> when vpmovwb is not available (needs AVX512BW) for ix86_expand_vecop_qihi2
> and fall back to ix86_expand_vecop_qihi.
>
> gcc/ChangeLog:
>
> PR target/115069
> * config/i386/i386-expand.cc (ix86_expand_vecop_qihi2):
> Do not enable the optimization when AVX512BW is not enabled.
>
> gcc/testsuite/ChangeLog:
>
> PR target/115069
> * gcc.target/i386/pr115069.c: New.

LGTM, with a nit below.

Thanks,
Uros.

> ---
>  gcc/config/i386/i386-expand.cc   |  7 +++
>  gcc/testsuite/gcc.target/i386/pr115069.c | 10 ++
>  2 files changed, 17 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr115069.c
>
> diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
> index a6132911e6a..f7939761879 100644
> --- a/gcc/config/i386/i386-expand.cc
> +++ b/gcc/config/i386/i386-expand.cc
> @@ -24323,6 +24323,13 @@ ix86_expand_vecop_qihi2 (enum rtx_code code, rtx 
> dest, rtx op1, rtx op2)
>bool op2vec = GET_MODE_CLASS (GET_MODE (op2)) == MODE_VECTOR_INT;
>bool uns_p = code != ASHIFTRT;
>
> +  /* Without VPMOVWB (provided by AVX512BW ISA), the expansion uses the
> + generic permutation to merge the data back into the right place.  This
> + permutation results in VPERMQ, which is slow, so better fall back to
> + ix86_expand_vecop_qihi.  */
> +  if (!TARGET_AVX512BW)
> +return false;
> +
>if ((qimode == V16QImode && !TARGET_AVX2)
>|| (qimode == V32QImode && (!TARGET_AVX512BW || !TARGET_EVEX512))
>/* There are no V64HImode instructions.  */
> diff --git a/gcc/testsuite/gcc.target/i386/pr115069.c 
> b/gcc/testsuite/gcc.target/i386/pr115069.c
> new file mode 100644
> index 000..7f1ff209f26
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr115069.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mavx2" } */
> +/* { dg-final { scan-assembler-not "vpermq" } } */
> +
> +typedef char v16qi __attribute__((vector_size(16)));
> +
> +v16qi foo (v16qi a, v16qi b) {
> +return a * b;
> +}
> +

Please remove the trailing line.

> --
> 2.31.1
>


Re: [PATCH 2/2] [x86] Adjust rtx_cost for MEM to enable more simplication

2024-05-21 Thread Uros Bizjak
On Tue, May 21, 2024 at 7:13 AM liuhongt  wrote:
>
> For CONST_VECTOR_DUPLICATE_P in constant_pool, it is just broadcast or
> variants in ix86_vector_duplicate_simode_const.
> Adjust the cost to COSTS_N_INSNS (2) + speed which should be a little
> bit larger than broadcast.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?
>
> gcc/ChangeLog:
> PR target/114428
> * config/i386/i386.cc (ix86_rtx_costs): Adjust cost for
> CONST_VECTOR_DUPLICATE_P in constant_pool.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr114428.c: New test.

LGTM.

Thanks,
Uros.

> ---
>  gcc/config/i386/i386-expand.cc   |  2 +-
>  gcc/config/i386/i386-protos.h|  1 +
>  gcc/config/i386/i386.cc  | 13 +
>  gcc/testsuite/gcc.target/i386/pr114428.c | 18 ++
>  4 files changed, 33 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr114428.c
>
> diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
> index 4e16aedc5c1..d96c365e144 100644
> --- a/gcc/config/i386/i386-expand.cc
> +++ b/gcc/config/i386/i386-expand.cc
> @@ -588,7 +588,7 @@ ix86_expand_move (machine_mode mode, rtx operands[])
>
>  /* OP is a memref of CONST_VECTOR, return scalar constant mem
> if CONST_VECTOR is a vec_duplicate, else return NULL.  */
> -static rtx
> +rtx
>  ix86_broadcast_from_constant (machine_mode mode, rtx op)
>  {
>int nunits = GET_MODE_NUNITS (mode);
> diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
> index dbc861fb1ea..90712769200 100644
> --- a/gcc/config/i386/i386-protos.h
> +++ b/gcc/config/i386/i386-protos.h
> @@ -107,6 +107,7 @@ extern void ix86_expand_clear (rtx);
>  extern void ix86_expand_move (machine_mode, rtx[]);
>  extern void ix86_expand_vector_move (machine_mode, rtx[]);
>  extern void ix86_expand_vector_move_misalign (machine_mode, rtx[]);
> +extern rtx ix86_broadcast_from_constant (machine_mode, rtx);
>  extern rtx ix86_fixup_binary_operands (enum rtx_code, machine_mode,
>rtx[], bool = false);
>  extern void ix86_fixup_binary_operands_no_copy (enum rtx_code, machine_mode,
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index b4838b7939e..fdd9343e47a 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -22197,6 +22197,19 @@ ix86_rtx_costs (rtx x, machine_mode mode, int 
> outer_code_i, int opno,
>return true;
>
>  case MEM:
> +  /* CONST_VECTOR_DUPLICATE_P in constant_pool is just broadcast.
> +or variants in ix86_vector_duplicate_simode_const.  */
> +
> +  if (GET_MODE_SIZE (mode) >= 16
> + && VECTOR_MODE_P (mode)
> + && SYMBOL_REF_P (XEXP (x, 0))
> + && CONSTANT_POOL_ADDRESS_P (XEXP (x, 0))
> + && ix86_broadcast_from_constant (mode, x))
> +   {
> + *total = COSTS_N_INSNS (2) + speed;
> + return true;
> +   }
> +
>/* An insn that accesses memory is slightly more expensive
>   than one that does not.  */
>if (speed)
> diff --git a/gcc/testsuite/gcc.target/i386/pr114428.c 
> b/gcc/testsuite/gcc.target/i386/pr114428.c
> new file mode 100644
> index 000..bbbc5a080f6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr114428.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=x86-64-v3 -mno-avx512f -O2" } */
> +/* { dg-final { scan-assembler-not "vpsra[dw]" } } */
> +
> +void
> +foo2 (char* __restrict a, short* b)
> +{
> +  for (int i = 0; i != 32; i++)
> +a[i] = b[i] >> (short)8;
> +}
> +
> +void
> +foo3 (char* __restrict a, short* b)
> +{
> +  for (int i = 0; i != 16; i++)
> +a[i] = b[i] >> (short)8;
> +}
> +
> --
> 2.31.1
>


Re: [PATCH] i386: Disable ix86_expand_vecop_qihi2 when !TARGET_AVX512BW

2024-05-21 Thread Uros Bizjak
On Tue, May 21, 2024 at 8:16 AM Haochen Jiang  wrote:
>
> Hi all,
>
> Since vpermq is really slow, we should avoid using it when it is
> the only instruction could be used for ix86_expand_vecop_qihi2.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu. Ok for trunk?
>
> Thx,
> Haochen
>
> gcc/ChangeLog:
>
> PR target/115069
> * config/i386/i386-expand.cc (ix86_expand_vecop_qihi2):
> Do not enable the optimization when AVX512BW is not enabled.
> ---
>  gcc/config/i386/i386-expand.cc | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
> index a6132911e6a..f24c800bb4f 100644
> --- a/gcc/config/i386/i386-expand.cc
> +++ b/gcc/config/i386/i386-expand.cc
> @@ -24323,6 +24323,11 @@ ix86_expand_vecop_qihi2 (enum rtx_code code, rtx 
> dest, rtx op1, rtx op2)
>bool op2vec = GET_MODE_CLASS (GET_MODE (op2)) == MODE_VECTOR_INT;
>bool uns_p = code != ASHIFTRT;
>
> +  /* vpermq is slow and we should not fall into the optimization when
> + it is the only instruction to be selected.  */

Please rather say something like:

/* Without VPMOVWB (provided by AVX512BW ISA), the expansion uses the generic
permutation to merge the data back into the right place.  This
permutation results
in VPERMQ, which is slow, so better fall back to expand_vecop_qihi.  */

Uros.

> +  if (!TARGET_AVX512BW)
> +return false;
> +
>if ((qimode == V16QImode && !TARGET_AVX2)
>|| (qimode == V32QImode && (!TARGET_AVX512BW || !TARGET_EVEX512))
>/* There are no V64HImode instructions.  */
> --
> 2.31.1
>


[PATCH] i386: Rename sat_plusminus expanders to standard names [PR11260]

2024-05-17 Thread Uros Bizjak
Rename _3 expander to a standard ssadd,
usadd, sssub and ussub name to enable corresponding optab expansion.

Also add named expander for MMX modes.

PR middle-end/112600

gcc/ChangeLog:

* config/i386/mmx.md (3): New expander.
* config/i386/sse.md
(_3):
Rename expander to 3.
(3): Update for rename.
* config/i386/i386-builtin.def (BDESC): Update for rename.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr112600-1a.c: New test.
* gcc.target/i386/pr112600-1b.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index ab73e20121a..927a79bb825 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -800,14 +800,14 @@ BDESC (OPTION_MASK_ISA_SSE2, 0, CODE_FOR_subv8hi3, 
"__builtin_ia32_psubw128", IX
 BDESC (OPTION_MASK_ISA_SSE2, 0, CODE_FOR_subv4si3, "__builtin_ia32_psubd128", 
IX86_BUILTIN_PSUBD128, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI)
 BDESC (OPTION_MASK_ISA_SSE2, 0, CODE_FOR_subv2di3, "__builtin_ia32_psubq128", 
IX86_BUILTIN_PSUBQ128, UNKNOWN, (int) V2DI_FTYPE_V2DI_V2DI)
 
-BDESC (OPTION_MASK_ISA_SSE2, 0, CODE_FOR_sse2_ssaddv16qi3, 
"__builtin_ia32_paddsb128", IX86_BUILTIN_PADDSB128, UNKNOWN, (int) 
V16QI_FTYPE_V16QI_V16QI)
-BDESC (OPTION_MASK_ISA_SSE2, 0, CODE_FOR_sse2_ssaddv8hi3, 
"__builtin_ia32_paddsw128", IX86_BUILTIN_PADDSW128, UNKNOWN, (int) 
V8HI_FTYPE_V8HI_V8HI)
-BDESC (OPTION_MASK_ISA_SSE2, 0, CODE_FOR_sse2_sssubv16qi3, 
"__builtin_ia32_psubsb128", IX86_BUILTIN_PSUBSB128, UNKNOWN, (int) 
V16QI_FTYPE_V16QI_V16QI)
-BDESC (OPTION_MASK_ISA_SSE2, 0, CODE_FOR_sse2_sssubv8hi3, 
"__builtin_ia32_psubsw128", IX86_BUILTIN_PSUBSW128, UNKNOWN, (int) 
V8HI_FTYPE_V8HI_V8HI)
-BDESC (OPTION_MASK_ISA_SSE2, 0, CODE_FOR_sse2_usaddv16qi3, 
"__builtin_ia32_paddusb128", IX86_BUILTIN_PADDUSB128, UNKNOWN, (int) 
V16QI_FTYPE_V16QI_V16QI)
-BDESC (OPTION_MASK_ISA_SSE2, 0, CODE_FOR_sse2_usaddv8hi3, 
"__builtin_ia32_paddusw128", IX86_BUILTIN_PADDUSW128, UNKNOWN, (int) 
V8HI_FTYPE_V8HI_V8HI)
-BDESC (OPTION_MASK_ISA_SSE2, 0, CODE_FOR_sse2_ussubv16qi3, 
"__builtin_ia32_psubusb128", IX86_BUILTIN_PSUBUSB128, UNKNOWN, (int) 
V16QI_FTYPE_V16QI_V16QI)
-BDESC (OPTION_MASK_ISA_SSE2, 0, CODE_FOR_sse2_ussubv8hi3, 
"__builtin_ia32_psubusw128", IX86_BUILTIN_PSUBUSW128, UNKNOWN, (int) 
V8HI_FTYPE_V8HI_V8HI)
+BDESC (OPTION_MASK_ISA_SSE2, 0, CODE_FOR_ssaddv16qi3, 
"__builtin_ia32_paddsb128", IX86_BUILTIN_PADDSB128, UNKNOWN, (int) 
V16QI_FTYPE_V16QI_V16QI)
+BDESC (OPTION_MASK_ISA_SSE2, 0, CODE_FOR_ssaddv8hi3, 
"__builtin_ia32_paddsw128", IX86_BUILTIN_PADDSW128, UNKNOWN, (int) 
V8HI_FTYPE_V8HI_V8HI)
+BDESC (OPTION_MASK_ISA_SSE2, 0, CODE_FOR_sssubv16qi3, 
"__builtin_ia32_psubsb128", IX86_BUILTIN_PSUBSB128, UNKNOWN, (int) 
V16QI_FTYPE_V16QI_V16QI)
+BDESC (OPTION_MASK_ISA_SSE2, 0, CODE_FOR_sssubv8hi3, 
"__builtin_ia32_psubsw128", IX86_BUILTIN_PSUBSW128, UNKNOWN, (int) 
V8HI_FTYPE_V8HI_V8HI)
+BDESC (OPTION_MASK_ISA_SSE2, 0, CODE_FOR_usaddv16qi3, 
"__builtin_ia32_paddusb128", IX86_BUILTIN_PADDUSB128, UNKNOWN, (int) 
V16QI_FTYPE_V16QI_V16QI)
+BDESC (OPTION_MASK_ISA_SSE2, 0, CODE_FOR_usaddv8hi3, 
"__builtin_ia32_paddusw128", IX86_BUILTIN_PADDUSW128, UNKNOWN, (int) 
V8HI_FTYPE_V8HI_V8HI)
+BDESC (OPTION_MASK_ISA_SSE2, 0, CODE_FOR_ussubv16qi3, 
"__builtin_ia32_psubusb128", IX86_BUILTIN_PSUBUSB128, UNKNOWN, (int) 
V16QI_FTYPE_V16QI_V16QI)
+BDESC (OPTION_MASK_ISA_SSE2, 0, CODE_FOR_ussubv8hi3, 
"__builtin_ia32_psubusw128", IX86_BUILTIN_PSUBUSW128, UNKNOWN, (int) 
V8HI_FTYPE_V8HI_V8HI)
 
 BDESC (OPTION_MASK_ISA_SSE2, 0, CODE_FOR_mulv8hi3, "__builtin_ia32_pmullw128", 
IX86_BUILTIN_PMULLW128, UNKNOWN, (int) V8HI_FTYPE_V8HI_V8HI)
 BDESC (OPTION_MASK_ISA_SSE2, 0, CODE_FOR_smulv8hi3_highpart, 
"__builtin_ia32_pmulhw128", IX86_BUILTIN_PMULHW128, UNKNOWN,(int) 
V8HI_FTYPE_V8HI_V8HI)
@@ -1193,10 +1193,10 @@ BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_addv32qi3, 
"__builtin_ia32_paddb256", I
 BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_addv16hi3, "__builtin_ia32_paddw256", 
IX86_BUILTIN_PADDW256, UNKNOWN, (int) V16HI_FTYPE_V16HI_V16HI)
 BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_addv8si3, "__builtin_ia32_paddd256", 
IX86_BUILTIN_PADDD256, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI)
 BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_addv4di3, "__builtin_ia32_paddq256", 
IX86_BUILTIN_PADDQ256, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI)
-BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_avx2_ssaddv32qi3, 
"__builtin_ia32_paddsb256", IX86_BUILTIN_PADDSB256, UNKNOWN, (int) 
V32QI_FTYPE_V32QI_V32QI)
-BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_avx2_ssaddv16hi3, 
"__builtin_ia32_paddsw256", IX86_BUILTIN_PADDSW256, UNKNOWN, (int) 
V16HI_FTYPE_V16HI_V16HI)
-BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_avx2_usaddv32qi3, 
"__builtin_ia32_paddusb256", IX86_BUILTIN_PADDUSB256, UNKNOWN, (int) 
V32QI_FTYPE_V32QI_V32QI)
-BDESC (OPTION_MASK_ISA_AVX2, 0, CODE_FOR_avx2_usaddv16hi3, 
"__builtin_ia32_paddusw256", IX86_BUILTIN_PADDUSW256, UNKNOWN, (int) 

Re: [PATCH] [x86] Set d.one_operand_p to true when TARGET_SSSE3 in ix86_expand_vecop_qihi_partial.

2024-05-15 Thread Uros Bizjak
On Wed, May 15, 2024 at 12:05 PM liuhongt  wrote:
>
> pshufb is available under TARGET_SSSE3, so
> ix86_expand_vec_perm_const_1 must return true when TARGET_SSSE3.
> w/o TARGET_SSSE3, if we set one_operand_p to true, 
> ix86_expand_vec_perm_const_1 could return false.
>
> With the patch under -march=x86-64-v2
>
> v8qi
> foo (v8qi a)
> {
>   return a >> 5;
> }
>
> <   pmovsxbw%xmm0, %xmm0
> <   psraw   $5, %xmm0
> <   pshufb  .LC0(%rip), %xmm0
> ---
> >   movdqa  %xmm0, %xmm1
> >   pcmpeqd %xmm0, %xmm0
> >   pmovsxbw%xmm1, %xmm1
> >   psrlw   $8, %xmm0
> >   psraw   $5, %xmm1
> >   pand%xmm1, %xmm0
> >   packuswb%xmm0, %xmm0
>
> Although there's a memory load from constant pool, but it should be
> better when it's inside a loop. The load from constant pool can be
> hoist out. it's 1 instruction vs 4 instructions.
>
> <   pshufb  .LC0(%rip), %xmm0
>
> vs.
>
> >   pcmpeqd %xmm0, %xmm0
> >   psrlw   $8, %xmm0
> >   pand%xmm1, %xmm0
> >   packuswb%xmm0, %xmm0
>
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk.
>
> gcc/ChangeLog:
>
> PR target/114514
> * config/i386/i386-expand.cc (ix86_expand_vecop_qihi_partial):
> Set d.one_operand_p to true when TARGET_SSSE3.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr114514-shufb.c: New test.

LGTM.

Thanks,
Uros.

> ---
>  gcc/config/i386/i386-expand.cc|  2 +-
>  .../gcc.target/i386/pr114514-shufb.c  | 35 +++
>  2 files changed, 36 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr114514-shufb.c
>
> diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
> index ab6631f51e3..ae2e9ab4e05 100644
> --- a/gcc/config/i386/i386-expand.cc
> +++ b/gcc/config/i386/i386-expand.cc
> @@ -24394,7 +24394,7 @@ ix86_expand_vecop_qihi_partial (enum rtx_code code, 
> rtx dest, rtx op1, rtx op2)
>d.op0 = d.op1 = qres;
>d.vmode = V16QImode;
>d.nelt = 16;
> -  d.one_operand_p = false;
> +  d.one_operand_p = TARGET_SSSE3;
>d.testing_p = false;
>
>for (i = 0; i < d.nelt; ++i)
> diff --git a/gcc/testsuite/gcc.target/i386/pr114514-shufb.c 
> b/gcc/testsuite/gcc.target/i386/pr114514-shufb.c
> new file mode 100644
> index 000..71fdc9d8daf
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr114514-shufb.c
> @@ -0,0 +1,35 @@
> +/* { dg-do compile } */
> +/* { dg-options "-msse4.1 -O2 -mno-avx512f" } */
> +/* { dg-final { scan-assembler-not "packuswb" } }  */
> +/* { dg-final { scan-assembler-times "pshufb" 4 { target { ! ia32 } } } }  */
> +/* { dg-final { scan-assembler-times "pshufb" 6 { target  ia32 } } }  */
> +
> +typedef unsigned char v8uqi __attribute__((vector_size(8)));
> +typedef  char v8qi __attribute__((vector_size(8)));
> +typedef unsigned char v4uqi __attribute__((vector_size(4)));
> +typedef  char v4qi __attribute__((vector_size(4)));
> +
> +v8qi
> +foo (v8qi a)
> +{
> +  return a >> 5;
> +}
> +
> +v8uqi
> +foo1 (v8uqi a)
> +{
> +  return a >> 5;
> +}
> +
> +v4qi
> +foo2 (v4qi a)
> +{
> +  return a >> 5;
> +}
> +
> +v4uqi
> +foo3 (v4uqi a)
> +{
> +  return a >> 5;
> +}
> +
> --
> 2.31.1
>


Re: [PATCH 1/8] [APX NF]: Support APX NF add

2024-05-15 Thread Uros Bizjak
On Wed, May 15, 2024 at 9:43 AM Kong, Lingling  wrote:
>
> From: Hongyu Wang 
>
> APX NF(no flags) feature implements suppresses the update of status flags for 
> arithmetic operations.
>
> For NF add, it is not clear whether NF add can be faster than lea. If so, the 
> pattern needs to be adjusted to prefer LEA generation.
>
> gcc/ChangeLog:
>
> * config/i386/i386-opts.h (enum apx_features): Add nf
> enumeration.
> * config/i386/i386.h (TARGET_APX_NF): New.
> * config/i386/i386.md (*add_1_nf): New define_insn.
> * config/i386/i386.opt: Add apx_nf enumeration.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/apx-ndd.c: Fixed test.
> * gcc.target/i386/apx-nf.c: New test.
>
> Co-authored-by: Lingling Kong 
>
> Bootstrapped and regtested on x86_64-linux-gnu. And Supported SPEC 2017 run 
> normally on Intel software development emulator.
> Ok for trunk?
>
> ---
>  gcc/config/i386/i386-opts.h |  3 +-
>  gcc/config/i386/i386.h  |  1 +
>  gcc/config/i386/i386.md | 42 +
>  gcc/config/i386/i386.opt|  3 ++
>  gcc/testsuite/gcc.target/i386/apx-ndd.c |  2 +-
>  gcc/testsuite/gcc.target/i386/apx-nf.c  |  6 
>  6 files changed, 55 insertions(+), 2 deletions(-)  create mode 100644 
> gcc/testsuite/gcc.target/i386/apx-nf.c
>
> diff --git a/gcc/config/i386/i386-opts.h b/gcc/config/i386/i386-opts.h index 
> ef2825803b3..60176ce609f 100644
> --- a/gcc/config/i386/i386-opts.h
> +++ b/gcc/config/i386/i386-opts.h
> @@ -140,7 +140,8 @@ enum apx_features {
>apx_push2pop2 = 1 << 1,
>apx_ndd = 1 << 2,
>apx_ppx = 1 << 3,
> -  apx_all = apx_egpr | apx_push2pop2 | apx_ndd | apx_ppx,
> +  apx_nf = 1<< 4,
> +  apx_all = apx_egpr | apx_push2pop2 | apx_ndd | apx_ppx | apx_nf,
>  };
>
>  #endif
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index 
> 529edff93a4..f20ae4726da 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -55,6 +55,7 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
> If not, see  #define TARGET_APX_PUSH2POP2 (ix86_apx_features & apx_push2pop2) 
>  #define TARGET_APX_NDD (ix86_apx_features & apx_ndd)  #define TARGET_APX_PPX 
> (ix86_apx_features & apx_ppx)
> +#define TARGET_APX_NF (ix86_apx_features & apx_nf)
>
>  #include "config/vxworks-dummy.h"
>
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 
> 764bfe20ff2..4a9e35c4990 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -6233,6 +6233,48 @@
>  }
>  })
>
>
> +;; NF instructions.
> +
> +(define_insn "*add_1_nf"
> +  [(set (match_operand:SWI 0 "nonimmediate_operand" "=rm,rje,r,r,r,r,r,r")
> +   (plus:SWI
> + (match_operand:SWI 1 "nonimmediate_operand" "%0,0,0,r,r,rje,jM,r")
> + (match_operand:SWI 2 "x86_64_general_operand"
> +"r,e,BM,0,le,r,e,BM")))]
> +  "TARGET_APX_NF &&
> +   ix86_binary_operator_ok (PLUS, mode, operands,
> +   TARGET_APX_NDD)"

I wonder if we can use "define_subst" to conditionally add flags
clobber for !TARGET_APX_NF targets. Even the example for "Define
Subst" uses the insn w/ and w/o the clobber, so I think it is worth
considering this approach.

Uros.


Re: [PATCH 1/8] [APX NF]: Support APX NF add

2024-05-15 Thread Uros Bizjak
On Wed, May 15, 2024 at 9:43 AM Kong, Lingling  wrote:
>
> From: Hongyu Wang 
>
> APX NF(no flags) feature implements suppresses the update of status flags for 
> arithmetic operations.
>
> For NF add, it is not clear whether NF add can be faster than lea. If so, the 
> pattern needs to be adjusted to prefer LEA generation.

> diff --git a/gcc/testsuite/gcc.target/i386/apx-ndd.c 
> b/gcc/testsuite/gcc.target/i386/apx-ndd.c
> index 0eb751ad225..0ff4df0780c 100644
> --- a/gcc/testsuite/gcc.target/i386/apx-ndd.c
> +++ b/gcc/testsuite/gcc.target/i386/apx-ndd.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile { target { ! ia32 } } } */
> -/* { dg-options "-mapxf -march=x86-64 -O2" } */
> +/* { dg-options "-mapx-features=egpr,push2pop2,ndd,ppx -march=x86-64
> +-O2" } */

Please do not split options to a separate line; here and in other places.

Uros.


Re: [PATCH 1/1] [PATCH] x86:Add 3-instruction subroutine vector shift for V16QI in ix86_expand_vec_perm_const_1 [PR107563]

2024-05-14 Thread Uros Bizjak
On Thu, May 9, 2024 at 11:12 AM Levy Hsu  wrote:
>
> Hi All
>
> We've introduced a new subroutine in ix86_expand_vec_perm_const_1
> to optimize vector shifting for the V16QI type on x86.
> This patch uses a three-instruction sequence psrlw, psllw, and por
> to handle specific vector shuffle operations more efficiently.
> The change aims to improve assembly code generation for configurations
> supporting SSE2.
>
> Bootstrapped and tested on x86_64-linux-gnu, OK for trunk?
>
> Best
> Levy
>
> gcc/ChangeLog:
>
> PR target/107563
> * config/i386/i386-expand.cc (expand_vec_perm_psrlw_psllw_por): New
> subroutine.
> (ix86_expand_vec_perm_const_1): New Entry.

Please say (ix86_expand_vec_perm_const_1): Call expand_vec_perm_psrlw_psllw_por.

>
> gcc/testsuite/ChangeLog:
>
> PR target/107563
> * g++.target/i386/pr107563-a.C: New test.
> * g++.target/i386/pr107563-b.C: New test.

OK with the above adjustment.

Thanks,
Uros.

> ---
>  gcc/config/i386/i386-expand.cc | 64 ++
>  gcc/testsuite/g++.target/i386/pr107563-a.C | 13 +
>  gcc/testsuite/g++.target/i386/pr107563-b.C | 12 
>  3 files changed, 89 insertions(+)
>  create mode 100755 gcc/testsuite/g++.target/i386/pr107563-a.C
>  create mode 100755 gcc/testsuite/g++.target/i386/pr107563-b.C
>
> diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
> index 2f27bfb484c..5098d2886bb 100644
> --- a/gcc/config/i386/i386-expand.cc
> +++ b/gcc/config/i386/i386-expand.cc
> @@ -22362,6 +22362,67 @@ expand_vec_perm_2perm_pblendv (struct 
> expand_vec_perm_d *d, bool two_insn)
>return true;
>  }
>
> +/* A subroutine of ix86_expand_vec_perm_const_1.
> +   Implement a permutation with psrlw, psllw and por.
> +   It handles case:
> +   __builtin_shufflevector (v,v,1,0,3,2,5,4,7,6,9,8,11,10,13,12,15,14);
> +   __builtin_shufflevector (v,v,1,0,3,2,5,4,7,6); */
> +
> +static bool
> +expand_vec_perm_psrlw_psllw_por (struct expand_vec_perm_d *d)
> +{
> +  unsigned i;
> +  rtx (*gen_shr) (rtx, rtx, rtx);
> +  rtx (*gen_shl) (rtx, rtx, rtx);
> +  rtx (*gen_or) (rtx, rtx, rtx);
> +  machine_mode mode = VOIDmode;
> +
> +  if (!TARGET_SSE2 || !d->one_operand_p)
> +return false;
> +
> +  switch (d->vmode)
> +{
> +case E_V8QImode:
> +  if (!TARGET_MMX_WITH_SSE)
> +   return false;
> +  mode = V4HImode;
> +  gen_shr = gen_ashrv4hi3;
> +  gen_shl = gen_ashlv4hi3;
> +  gen_or = gen_iorv4hi3;
> +  break;
> +case E_V16QImode:
> +  mode = V8HImode;
> +  gen_shr = gen_vlshrv8hi3;
> +  gen_shl = gen_vashlv8hi3;
> +  gen_or = gen_iorv8hi3;
> +  break;
> +default: return false;
> +}
> +
> +  if (!rtx_equal_p (d->op0, d->op1))
> +return false;
> +
> +  for (i = 0; i < d->nelt; i += 2)
> +if (d->perm[i] != i + 1 || d->perm[i + 1] != i)
> +  return false;
> +
> +  if (d->testing_p)
> +return true;
> +
> +  rtx tmp1 = gen_reg_rtx (mode);
> +  rtx tmp2 = gen_reg_rtx (mode);
> +  rtx op0 = force_reg (d->vmode, d->op0);
> +
> +  emit_move_insn (tmp1, lowpart_subreg (mode, op0, d->vmode));
> +  emit_move_insn (tmp2, lowpart_subreg (mode, op0, d->vmode));
> +  emit_insn (gen_shr (tmp1, tmp1, GEN_INT (8)));
> +  emit_insn (gen_shl (tmp2, tmp2, GEN_INT (8)));
> +  emit_insn (gen_or (tmp1, tmp1, tmp2));
> +  emit_move_insn (d->target, lowpart_subreg (d->vmode, tmp1, mode));
> +
> +  return true;
> +}
> +
>  /* A subroutine of ix86_expand_vec_perm_const_1.  Implement a V4DF
> permutation using two vperm2f128, followed by a vshufpd insn blending
> the two vectors together.  */
> @@ -23782,6 +23843,9 @@ ix86_expand_vec_perm_const_1 (struct 
> expand_vec_perm_d *d)
>if (expand_vec_perm_2perm_pblendv (d, false))
>  return true;
>
> +  if (expand_vec_perm_psrlw_psllw_por (d))
> +return true;
> +
>/* Try sequences of four instructions.  */
>
>if (expand_vec_perm_even_odd_trunc (d))
> diff --git a/gcc/testsuite/g++.target/i386/pr107563-a.C 
> b/gcc/testsuite/g++.target/i386/pr107563-a.C
> new file mode 100755
> index 000..605c1bdf814
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/i386/pr107563-a.C
> @@ -0,0 +1,13 @@
> +/* PR target/107563.C */
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-std=c++2b -O3 -msse2" } */
> +/* { dg-final { scan-assembler-times "psllw" 1 } } */
> +/* { dg-final { scan-assembler-times "psraw" 1 } } */
> +/* { dg-final { scan-assembler-times "por" 1 } } */
> +
> +using temp_vec_type2 [[__gnu__::__vector_size__(8)]] = char;
> +
> +void foo2(temp_vec_type2& v) noexcept
> +{
> +  v = __builtin_shufflevector(v, v, 1, 0, 3, 2, 5, 4, 7, 6);
> +}
> diff --git a/gcc/testsuite/g++.target/i386/pr107563-b.C 
> b/gcc/testsuite/g++.target/i386/pr107563-b.C
> new file mode 100755
> index 000..0ce3e8263bb
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/i386/pr107563-b.C
> @@ -0,0 +1,12 @@
> +/* PR target/107563.C */
> +/* { dg-options 

Re: [PATCH] x86:Add 3-instruction subroutine vector shift for V16QI in ix86_expand_vec_perm_const_1 [PR107563]

2024-05-08 Thread Uros Bizjak
On Wed, May 8, 2024 at 4:44 AM Levy Hsu  wrote:
>
> PR target/107563
>
> gcc/ChangeLog:
>
> * config/i386/i386-expand.cc (expand_vec_perm_psrlw_psllw_por): New
> subroutine.
> (ix86_expand_vec_perm_const_1): New Entry.
>
> gcc/testsuite/ChangeLog:
>
> * g++.target/i386/pr107563.C: New test.
> ---
>  gcc/config/i386/i386-expand.cc   | 64 
>  gcc/testsuite/g++.target/i386/pr107563.C | 23 +
>  2 files changed, 87 insertions(+)
>  create mode 100755 gcc/testsuite/g++.target/i386/pr107563.C
>
> diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
> index 2f27bfb484c..2718b0acb87 100644
> --- a/gcc/config/i386/i386-expand.cc
> +++ b/gcc/config/i386/i386-expand.cc
> @@ -22362,6 +22362,67 @@ expand_vec_perm_2perm_pblendv (struct 
> expand_vec_perm_d *d, bool two_insn)
>return true;
>  }
>
> +/* A subroutine of ix86_expand_vec_perm_const_1.
> +   Implement a permutation with psrlw, psllw and por.
> +   It handles case:
> +   __builtin_shufflevector (v,v,1,0,3,2,5,4,7,6,9,8,11,10,13,12,15,14);
> +   __builtin_shufflevector (v,v,1,0,3,2,5,4,7,6); */
> +
> +static bool
> +expand_vec_perm_psrlw_psllw_por (struct expand_vec_perm_d *d)
> +{
> +  unsigned i;
> +  rtx (*gen_shr) (rtx, rtx, rtx);
> +  rtx (*gen_shl) (rtx, rtx, rtx);
> +  rtx (*gen_or) (rtx, rtx, rtx);
> +  machine_mode mode = VOIDmode;
> +
> +  if (!TARGET_SSE2 || !d->one_operand_p)
> +return false;
> +
> +  switch (d->vmode)
> +{
> +case E_V8QImode:
> +  if (!TARGET_MMX_WITH_SSE)
> +   return false;
> +  mode = V4HImode;
> +  gen_shr = gen_ashrv4hi3;
> +  gen_shl = gen_ashlv4hi3;
> +  gen_or = gen_iorv4hi3;
> +  break;
> +case E_V16QImode:
> +  mode = V8HImode;
> +  gen_shr = gen_vlshrv8hi3;
> +  gen_shl = gen_vashlv8hi3;
> +  gen_or = gen_iorv8hi3;
> +  break;
> +default: return false;
> +}
> +
> +  if (!rtx_equal_p (d->op0, d->op1))
> +return false;
> +
> +  for (i = 0; i < d->nelt; i += 2)
> +if (d->perm[i] != i + 1 || d->perm[i + 1] != i)
> +  return false;
> +
> +  if (d->testing_p)
> +return true;
> +
> +  rtx tmp1 = gen_reg_rtx (mode);
> +  rtx tmp2 = gen_reg_rtx (mode);
> +  rtx op0 = force_reg (d->vmode, d->op0);
> +
> +  emit_move_insn (tmp1, lowpart_subreg (mode, op0, d->vmode));
> +  emit_move_insn (tmp2, lowpart_subreg (mode, op0, d->vmode));
> +  emit_insn (gen_shr (tmp1, tmp1, GEN_INT (8)));
> +  emit_insn (gen_shl (tmp2, tmp2, GEN_INT (8)));
> +  emit_insn (gen_or (tmp1, tmp1, tmp2));
> +  emit_move_insn (d->target, lowpart_subreg (d->vmode, tmp1, mode));
> +
> +  return true;
> +}
> +
>  /* A subroutine of ix86_expand_vec_perm_const_1.  Implement a V4DF
> permutation using two vperm2f128, followed by a vshufpd insn blending
> the two vectors together.  */
> @@ -23781,6 +23842,9 @@ ix86_expand_vec_perm_const_1 (struct 
> expand_vec_perm_d *d)
>
>if (expand_vec_perm_2perm_pblendv (d, false))
>  return true;
> +
> +  if (expand_vec_perm_psrlw_psllw_por (d))
> +return true;
>
>/* Try sequences of four instructions.  */
>
> diff --git a/gcc/testsuite/g++.target/i386/pr107563.C 
> b/gcc/testsuite/g++.target/i386/pr107563.C
> new file mode 100755
> index 000..5b0c648e8f1
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/i386/pr107563.C
> @@ -0,0 +1,23 @@
> +/* PR target/107563.C */
> +/* { dg-do compile { target { ! ia32 } } } */

Please split the testcase to two files, one (e.g. pr107563-a.C)
testing 8-byte vectors and the other (e.g. pr107563-b.C) using 16-byte
vectors. The latter can also be tested with 32-bit targets.

Uros.

> +/* { dg-options "-std=c++2b -O3 -msse2" } */
> +/* { dg-final { scan-assembler-not "movzbl" } } */
> +/* { dg-final { scan-assembler-not "salq" } } */
> +/* { dg-final { scan-assembler-not "orq" } } */
> +/* { dg-final { scan-assembler-not "punpcklqdq" } } */
> +/* { dg-final { scan-assembler-times "psllw" 2 } } */
> +/* { dg-final { scan-assembler-times "psrlw" 1 } } */
> +/* { dg-final { scan-assembler-times "psraw" 1 } } */
> +/* { dg-final { scan-assembler-times "por" 2 } } */
> +
> +using temp_vec_type [[__gnu__::__vector_size__ (16)]] = char;
> +void foo (temp_vec_type& v) noexcept
> +{
> +  v = __builtin_shufflevector(v,v,1,0,3,2,5,4,7,6,9,8,11,10,13,12,15,14);
> +}
> +
> +using temp_vec_type2 [[__gnu__::__vector_size__ (8)]] = char;
> +void foo2 (temp_vec_type2& v) noexcept
> +{
> +  v=__builtin_shufflevector(v,v,1,0,3,2,5,4,7,6);
> +}
> --
> 2.31.1
>


Re: [PATCH] x86: Fix cmov cost model issue [PR109549]

2024-05-06 Thread Uros Bizjak
On Mon, May 6, 2024 at 5:20 AM Hongtao Liu  wrote:
>
> CC uros.
>
> On Mon, May 6, 2024 at 11:03 AM Kong, Lingling  
> wrote:
> >
> > Hi,
> > (if_then_else:SI (eq (reg:CCZ 17 flags)
> > (const_int 0 [0]))
> > (reg/v:SI 101 [ e ])
> > (reg:SI 102))
> > The cost is 8 for the rtx, the cost for
> > (eq (reg:CCZ 17 flags) (const_int 0 [0])) is 4, but this is just an 
> > operator do not need to compute it's cost in cmov.
> It looks like a reasonable change to me, for cmov, the first operand
> of if_then_else is not a mask.
> >
> > Bootstrapped and regtested on x86_64-pc-linux-gnu.
> > OK for trunk?
> >
> > gcc/ChangeLog:
> >
> > PR target/109549
> > * config/i386/i386.cc (ix86_rtx_costs): The XEXP (x, 0) for cmov
> > is an operator do not need to compute cost.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/i386/cmov6.c: Fixed.

OK.

BTW: I'd like to point out PR85559 [1] that collects some persistent
issues with x86 CMOV insn, especially [2].

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=cmov
[2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56309

Uros.

> > ---
> >  gcc/config/i386/i386.cc   | 2 +-
> >  gcc/testsuite/gcc.target/i386/cmov6.c | 5 +
> >  2 files changed, 2 insertions(+), 5 deletions(-)
> >
> > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index 
> > 4d6b2b98761..59b4ce3bfbf 100644
> > --- a/gcc/config/i386/i386.cc
> > +++ b/gcc/config/i386/i386.cc
> > @@ -22237,7 +22237,7 @@ ix86_rtx_costs (rtx x, machine_mode mode, int 
> > outer_code_i, int opno,
> > {
> >   /* cmov.  */
> >   *total = COSTS_N_INSNS (1);
> > - if (!REG_P (XEXP (x, 0)))
> > + if (!COMPARISON_P (XEXP (x, 0)) && !REG_P (XEXP (x, 0)))
> > *total += rtx_cost (XEXP (x, 0), mode, code, 0, speed);
> >   if (!REG_P (XEXP (x, 1)))
> > *total += rtx_cost (XEXP (x, 1), mode, code, 1, speed); diff 
> > --git a/gcc/testsuite/gcc.target/i386/cmov6.c 
> > b/gcc/testsuite/gcc.target/i386/cmov6.c
> > index 5111c8a9099..535326e4c2a 100644
> > --- a/gcc/testsuite/gcc.target/i386/cmov6.c
> > +++ b/gcc/testsuite/gcc.target/i386/cmov6.c
> > @@ -1,9 +1,6 @@
> >  /* { dg-do compile } */
> >  /* { dg-options "-O2 -march=k8" } */
> > -/* if-converting this sequence would require two cmov
> > -   instructions and seems to always cost more independent
> > -   of the TUNE_ONE_IF_CONV setting.  */
> > -/* { dg-final { scan-assembler-not "cmov\[^6\]" } } */
> > +/* { dg-final { scan-assembler "cmov\[^6\]" } } */
> >
> >  /* Verify that blocks are converted to conditional moves.  */  extern int 
> > bar (int, int);
> > --
> > 2.31.1
> >
>
>
> --
> BR,
> Hongtao


Re: [PATCH] [x86] Adjust alternative *k to ?k for avx512 mask in zero_extend patterns

2024-04-28 Thread Uros Bizjak
On Sun, Apr 28, 2024 at 7:47 AM liuhongt  wrote:
>
> So when both source operand and dest operand require avx512 MASK_REGS, RA
> can allocate MASK_REGS register instead of GPR to avoid reload it from
> GPR to MASK_REGS.
> It's similar as what did for logic patterns.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?
>
> gcc/ChangeLog:
>
> * config/i386/i386.md: (zero_extendsidi2): Adjust
> alternative *k to ?k.
> (zero_extenddi2): Ditto.
> (*zero_extendsi2): Ditto.
> (*zero_extendqihi2): Ditto.

OK.

Thanks,
Uros.

> ---
>  gcc/config/i386/i386.md   | 16 +++
>  .../gcc.target/i386/zero_extendkmask.c| 43 +++
>  2 files changed, 51 insertions(+), 8 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/zero_extendkmask.c
>
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index d4ce3809e6d..f2ab7fdcd58 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -4567,10 +4567,10 @@ (define_expand "zero_extendsidi2"
>
>  (define_insn "*zero_extendsidi2"
>[(set (match_operand:DI 0 "nonimmediate_operand"
> -   "=r,?r,?o,r   ,o,?*y,?!*y,$r,$v,$x,*x,*v,*r,*k")
> +   "=r,?r,?o,r   ,o,?*y,?!*y,$r,$v,$x,*x,*v,?r,?k")
> (zero_extend:DI
>  (match_operand:SI 1 "x86_64_zext_operand"
> -   "0 ,rm,r ,rmWz,0,r  ,m   ,v ,r ,m ,*x,*v,*k,*km")))]
> +   "0 ,rm,r ,rmWz,0,r  ,m   ,v ,r ,m ,*x,*v,?k,?km")))]
>""
>  {
>switch (get_attr_type (insn))
> @@ -4703,9 +4703,9 @@ (define_mode_attr kmov_isa
>[(QI "avx512dq") (HI "avx512f") (SI "avx512bw") (DI "avx512bw")])
>
>  (define_insn "zero_extenddi2"
> -  [(set (match_operand:DI 0 "register_operand" "=r,*r,*k")
> +  [(set (match_operand:DI 0 "register_operand" "=r,?r,?k")
> (zero_extend:DI
> -(match_operand:SWI12 1 "nonimmediate_operand" "m,*k,*km")))]
> +(match_operand:SWI12 1 "nonimmediate_operand" "m,?k,?km")))]
>"TARGET_64BIT"
>"@
> movz{l|x}\t{%1, %k0|%k0, %1}
> @@ -4758,9 +4758,9 @@ (define_insn_and_split "zero_extendsi2_and"
> (set_attr "mode" "SI")])
>
>  (define_insn "*zero_extendsi2"
> -  [(set (match_operand:SI 0 "register_operand" "=r,*r,*k")
> +  [(set (match_operand:SI 0 "register_operand" "=r,?r,?k")
> (zero_extend:SI
> - (match_operand:SWI12 1 "nonimmediate_operand" "m,*k,*km")))]
> + (match_operand:SWI12 1 "nonimmediate_operand" "m,?k,?km")))]
>"!(TARGET_ZERO_EXTEND_WITH_AND && optimize_function_for_speed_p (cfun))"
>"@
> movz{l|x}\t{%1, %0|%0, %1}
> @@ -4813,8 +4813,8 @@ (define_insn_and_split "zero_extendqihi2_and"
>
>  ; zero extend to SImode to avoid partial register stalls
>  (define_insn "*zero_extendqihi2"
> -  [(set (match_operand:HI 0 "register_operand" "=r,*r,*k")
> -   (zero_extend:HI (match_operand:QI 1 "nonimmediate_operand" 
> "qm,*k,*km")))]
> +  [(set (match_operand:HI 0 "register_operand" "=r,?r,?k")
> +   (zero_extend:HI (match_operand:QI 1 "nonimmediate_operand" 
> "qm,?k,?km")))]
>"!(TARGET_ZERO_EXTEND_WITH_AND && optimize_function_for_speed_p (cfun))"
>"@
> movz{bl|x}\t{%1, %k0|%k0, %1}
> diff --git a/gcc/testsuite/gcc.target/i386/zero_extendkmask.c 
> b/gcc/testsuite/gcc.target/i386/zero_extendkmask.c
> new file mode 100644
> index 000..6b18980bbd1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/zero_extendkmask.c
> @@ -0,0 +1,43 @@
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-march=x86-64-v4 -O2" } */
> +/* { dg-final { scan-assembler-not {(?n)shr[bwl]} } } */
> +/* { dg-final { scan-assembler-not {(?n)movz[bw]} } } */
> +
> +#include
> +
> +__m512
> +foo (__m512d a, __m512d b, __m512 c, __m512 d)
> +{
> +  return _mm512_mask_mov_ps (c, (__mmask16) (_mm512_cmpeq_pd_mask (a, b) >> 
> 1), d);
> +}
> +
> +
> +__m512i
> +foo1 (__m512d a, __m512d b, __m512i c, __m512i d)
> +{
> +  return _mm512_mask_mov_epi16 (c, (__mmask32) (_mm512_cmpeq_pd_mask (a, b) 
> >> 1), d);
> +}
> +
> +__m512i
> +foo2 (__m512d a, __m512d b, __m512i c, __m512i d)
> +{
> +  return _mm512_mask_mov_epi8 (c, (__mmask64) (_mm512_cmpeq_pd_mask (a, b) 
> >> 1), d);
> +}
> +
> +__m512i
> +foo3 (__m512 a, __m512 b, __m512i c, __m512i d)
> +{
> +  return _mm512_mask_mov_epi16 (c, (__mmask32) (_mm512_cmpeq_ps_mask (a, b) 
> >> 1), d);
> +}
> +
> +__m512i
> +foo4 (__m512 a, __m512 b, __m512i c, __m512i d)
> +{
> +  return _mm512_mask_mov_epi8 (c, (__mmask64) (_mm512_cmpeq_ps_mask (a, b) 
> >> 1), d);
> +}
> +
> +__m512i
> +foo5 (__m512i a, __m512i b, __m512i c, __m512i d)
> +{
> +  return _mm512_mask_mov_epi8 (c, (__mmask64) (_mm512_cmp_epi16_mask (a, b, 
> 5) >> 1), d);
> +}
> --
> 2.31.1
>


Re: [PATCH] i386: Fix array index overflow in pr105354-2.c

2024-04-26 Thread Uros Bizjak
On Fri, Apr 26, 2024 at 11:03 AM Haochen Jiang  wrote:
>
> Hi all,
>
> The array index should not be over 8 for v8hi, or it will fail
> under -O0 or using -fstack-protector.
>
> This patch aims to fix that, which is mentioned in PR110621.
>
> Commit as obvious and backport to GCC13.
>
> Thx,
> Haochen
>
> gcc/testsuite/ChangeLog:
>
> PR target/110621
> * gcc.target/i386/pr105354-2.c: As mentioned.

Please note that the ChangeLog entry gets copied into the relevant
ChangeLog file independently of the commit message. So, the above
entry will be copied to gcc/testsuite/ChangeLog without any reference
to what was mentioned.

Uros.

> ---
>  gcc/testsuite/gcc.target/i386/pr105354-2.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr105354-2.c 
> b/gcc/testsuite/gcc.target/i386/pr105354-2.c
> index b78b62e1e7e..1c592e84860 100644
> --- a/gcc/testsuite/gcc.target/i386/pr105354-2.c
> +++ b/gcc/testsuite/gcc.target/i386/pr105354-2.c
> @@ -17,7 +17,7 @@ sse2_test (void)
>b.a[i] = i + 16;
>res_ab.a[i] = 0;
>exp_ab.a[i] = -1;
> -  if (i <= 8)
> +  if (i < 8)
> {
>   c.a[i] = i;
>   d.a[i] = i + 8;
> --
> 2.31.1
>


Re: [PATCH] i386: Avoid =, r, r andn double-word alternative for ia32 [PR114810]

2024-04-23 Thread Uros Bizjak
On Tue, Apr 23, 2024 at 5:50 PM Jakub Jelinek  wrote:
>
> Hi!
>
> As discussed in the PR, on ia32 with its 8 GPRs, where 1 is always fixed
> and other 2 often are as well having an alternative which needs 3
> double-word registers is just too much for RA.
> The following patch splits that alternative into two, one with o is used
> even on ia32, but one with the 3x r is used just for -m64/-mx32.
> Tried to reduce the testcase further, but it wasn't easily possible.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2024-04-23  Jakub Jelinek  
>
> PR target/114810
> * config/i386/i386.md (*andn3_doubleword_bmi): Split the =,r,ro
> alternative into =,r,r enabled only for x64 and =,r,o.
>
> * g++.target/i386/pr114810.C: New test.

OK.

Thanks,
Uros.

>
> --- gcc/config/i386/i386.md.jj  2024-04-15 14:25:58.203322878 +0200
> +++ gcc/config/i386/i386.md 2024-04-23 12:15:47.171956091 +0200
> @@ -12482,10 +12482,10 @@ (define_split
>  })
>
>  (define_insn_and_split "*andn3_doubleword_bmi"
> -  [(set (match_operand: 0 "register_operand" "=,r,r")
> +  [(set (match_operand: 0 "register_operand" "=,,r,r")
> (and:
> - (not: (match_operand: 1 "register_operand" "r,0,r"))
> - (match_operand: 2 "nonimmediate_operand" "ro,ro,0")))
> + (not: (match_operand: 1 "register_operand" "r,r,0,r"))
> + (match_operand: 2 "nonimmediate_operand" "r,o,ro,0")))
> (clobber (reg:CC FLAGS_REG))]
>"TARGET_BMI"
>"#"
> @@ -12496,7 +12496,8 @@ (define_insn_and_split "*andn3_doub
> (parallel [(set (match_dup 3)
>(and:DWIH (not:DWIH (match_dup 4)) (match_dup 5)))
>   (clobber (reg:CC FLAGS_REG))])]
> -  "split_double_mode (mode, [0], 3, [0], 
> [3]);")
> +  "split_double_mode (mode, [0], 3, [0], 
> [3]);"
> +  [(set_attr "isa" "x64,*,*,*")])
>
>  (define_insn_and_split "*andn3_doubleword"
>[(set (match_operand:DWI 0 "register_operand")
> --- gcc/testsuite/g++.target/i386/pr114810.C.jj 2024-04-23 14:21:19.202613799 
> +0200
> +++ gcc/testsuite/g++.target/i386/pr114810.C2024-04-23 14:24:22.813116589 
> +0200
> @@ -0,0 +1,861 @@
> +// PR target/114810
> +// { dg-do compile { target { { { *-*-linux* } && ia32 } && c++17 } } }
> +// { dg-options "-mstackrealign -O2 -mbmi -fno-exceptions -fno-plt 
> -march=x86-64 -w" }
> +// { dg-additional-options "-fpie" { target pie } }
> +
> +enum E1 { a, dp, b, jm, c, dq, d, mj, e, dr, f, jn, h, dt, j, nt, l, du, m, 
> jo, n, dv, o, mk, p, dw, q, jp, s, dx, t, ol, u, dy, v, jq, w };
> +enum dz { x, ml, y };
> +struct ea { short g; } z, jr;
> +long long aa;
> +struct eb { ea ab; ea dp[]; };
> +enum ac { };
> +typedef enum { } nu;
> +struct ad { ac k; };
> +unsigned ec (long);
> +struct ae;
> +int js (ae);
> +unsigned af ();
> +struct ed;
> +template < int ag > struct ee { using ah = ed[ag]; };
> +template < int ag > struct array { typename ee < ag >::ah ai; ed & 
> operator[] (int aj) { return ai[aj]; } };
> +struct { void dp (...); } ak;
> +void ef (int);
> +template < typename al > struct jt { al & operator[] (short); };
> +struct am { void operator= (bool); };
> +struct an { am operator[] (unsigned); };
> +template < typename, unsigned, unsigned >using eg = an;
> +struct ao;
> +struct ae { ae (ao *); };
> +struct mm { mm (); mm (int); };
> +enum ap { };
> +enum eh { };
> +bool aq, ju, ar, ei, nv, as, ej, at;
> +struct jv
> +{
> +  jv (eh au):dp (au) {}
> +  jv ();
> +  operator eh ();
> +  unsigned av ()
> +  {
> +aq = dp & 7;
> +return dp * (aq ? : 4);
> +  }
> +  unsigned ek ()
> +  {
> +int aw;
> +bool mn = dp & 7;
> +aw = dp * (mn ? : 4);
> +return aw + 3 >> 2;
> +  }
> +  eh dp;
> +} ax, el, ay, jw, az, em, ba, om;
> +struct ed
> +{
> +  ed ():bb (), dp () {}
> +  int bc () { return bb; }
> +  jv en () { return (eh) dp; }
> +  unsigned ek ()
> +  {
> +jv bd;
> +bd = (eh) dp;
> +return bd.ek ();
> +  }
> +  ap jx ();
> +  unsigned bb:24;
> +  int dp:8;
> +};
> +struct be { short dp = 0; } bf, eo;
> +struct bg
> +{
> +  bg ();
> +  bg (ed r)
> +  {
> +dp.bh = r;
> +if (r.bc ())
> +  mo = true;
> +else
> +  bi = true;
> +  }
> +  static bg ep (int);
> +  bg (be);
> +  struct { ed bh; } dp;
> +  union { char mo:1; char bi:1; short bj = 0; };
> +} jy, bk, eq, bl, mp, bm, er;
> +struct bn
> +{
> +  explicit bn (ed bo):bh (bo) {}
> +  ed dp ();
> +  ed bh;
> +  be es;
> +  char bj = 0;
> +};
> +struct bp
> +{
> +  eg < int, 6, 4 > dp;
> +};
> +jt < bg > bq;
> +jt < bn > definitions;
> +struct ao
> +{
> +  bp & br ();
> +};
> +enum jz:short;
> +template < typename > using bs = ae;
> +ao *et ();
> +short bt, nw;
> +struct bu
> +{
> +  int dp;
> +};
> +dz bv;
> +unsigned eu;
> +struct bw
> +{
> +  ac k;
> +  unsigned dp;
> +} *bx;
> +bool ka ();
> +struct by
> +{
> +  bool dp;
> +};
> +typedef enum
> +{ bz, ev } ca;
> +typedef enum
> +{
> +  mq, cb, ew, cc, kb, cd, ex, ce
> +} on;
> +typedef struct cf
> +{

Re: [PATCH] [testsuite] [i386] add -msse2 to tests that require it

2024-04-17 Thread Uros Bizjak
On Tue, Apr 16, 2024 at 5:52 AM Alexandre Oliva  wrote:
>
>
> Without -msse2, an i586-targeting toolchain fails bf16_short_warn.c
> because neither type __m128bh nor intrinsic _mm_cvtneps_pbh get
> declared.
>
> Regstrapped on x86_64-linux-gnu.  Also tested with gcc-13 on arm-,
> aarch64-, x86- and x86_64-vxworks7r2.  Ok to install?
>
>
> for  gcc/testsuite/ChangeLog
>
> * gcc.target/i386/bf16_short_warn.c: Add -msse2.

OK.

Thanks,
Uros.

> ---
>  gcc/testsuite/gcc.target/i386/bf16_short_warn.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/testsuite/gcc.target/i386/bf16_short_warn.c 
> b/gcc/testsuite/gcc.target/i386/bf16_short_warn.c
> index 3e47a815200c2..2e05624bc26f6 100644
> --- a/gcc/testsuite/gcc.target/i386/bf16_short_warn.c
> +++ b/gcc/testsuite/gcc.target/i386/bf16_short_warn.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2" } */
> +/* { dg-options "-O2 -msse2" } */
>
>  #include
>  typedef struct {
>
> --
> Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
>Free Software Activist   GNU Toolchain Engineer
> More tolerance and less prejudice are key for inclusion and diversity
> Excluding neuro-others for not behaving ""normal"" is *not* inclusive


Re: [PATCH] [testsuite] [i386] work around fails with --enable-frame-pointer

2024-04-17 Thread Uros Bizjak
On Tue, Apr 16, 2024 at 5:51 AM Alexandre Oliva  wrote:
>
>
> A few x86 tests get unexpected insn counts if the toolchain is
> configured with --enable-frame-pointer.  Add explicit
> -fomit-frame-pointer so that the expected insn sequences are output.
>
> Regstrapped on x86_64-linux-gnu.  Also tested with gcc-13 on arm-,
> aarch64-, x86- and x86_64-vxworks7r2.  Ok to install?
>
>
> for  gcc/testsuite/ChangeLog
>
> * gcc.target/i386/pr107261.c: Add -fomit-frame-pointer.
> * gcc.target/i386/pr69482-1.c: Likewise.
> * gcc.target/i386/pr69482-2.c: Likewise.

OK.

Thanks,
Uros.

> ---
>  gcc/testsuite/gcc.target/i386/pr107261.c  |2 +-
>  gcc/testsuite/gcc.target/i386/pr69482-1.c |2 +-
>  gcc/testsuite/gcc.target/i386/pr69482-2.c |2 +-
>  3 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr107261.c 
> b/gcc/testsuite/gcc.target/i386/pr107261.c
> index eb1d232fbfc4b..b422af9cbf9a2 100644
> --- a/gcc/testsuite/gcc.target/i386/pr107261.c
> +++ b/gcc/testsuite/gcc.target/i386/pr107261.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2 -msse2" } */
> +/* { dg-options "-O2 -msse2 -fomit-frame-pointer" } */
>
>  typedef __bf16 v4bf __attribute__ ((vector_size (8)));
>  typedef __bf16 v2bf __attribute__ ((vector_size (4)));
> diff --git a/gcc/testsuite/gcc.target/i386/pr69482-1.c 
> b/gcc/testsuite/gcc.target/i386/pr69482-1.c
> index 99bb6ad5a377b..7ef0e71b17c8e 100644
> --- a/gcc/testsuite/gcc.target/i386/pr69482-1.c
> +++ b/gcc/testsuite/gcc.target/i386/pr69482-1.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O3 -fno-stack-protector" } */
> +/* { dg-options "-O3 -fno-stack-protector -fomit-frame-pointer" } */
>
>  static inline void memset_s(void* s, int n) {
>volatile unsigned char * p = s;
> diff --git a/gcc/testsuite/gcc.target/i386/pr69482-2.c 
> b/gcc/testsuite/gcc.target/i386/pr69482-2.c
> index 58e89a7933364..6aabe4fb39399 100644
> --- a/gcc/testsuite/gcc.target/i386/pr69482-2.c
> +++ b/gcc/testsuite/gcc.target/i386/pr69482-2.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O2" } */
> +/* { dg-options "-O2 -fomit-frame-pointer" } */
>
>  void bar ()
>  {
>
> --
> Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
>Free Software Activist   GNU Toolchain Engineer
> More tolerance and less prejudice are key for inclusion and diversity
> Excluding neuro-others for not behaving ""normal"" is *not* inclusive


Re: Combine patch ping

2024-04-11 Thread Uros Bizjak
On Thu, Apr 11, 2024 at 4:02 PM Segher Boessenkool
 wrote:
>
> On Wed, Apr 10, 2024 at 08:32:39PM +0200, Uros Bizjak wrote:
> > On Wed, Apr 10, 2024 at 7:56 PM Segher Boessenkool
> >  wrote:
> > > This is never okay.  You cannot commit a patch without approval, *ever*.
>
> This is the biggest issue, to start with.  It is fundamental.
>
> > > That patch is also obvious -- obviously *wrong*, that is.  There are
> > > big assumptions everywhere in the compiler how a CC reg can be used.
> > > This violates that, as explained elsewhere.
> >
> > Can you please elaborate what is wrong with this concrete patch.
>
> The explanation of the patch is contradictory to how RTL works at all,
> so it is just wrong.  It might even do something sane, but I didn't get
> that far at all!

The commit message explains the problem, the solution is explained in
the last couple of lines. Please see [1] for a more thorough
explanation of the problem.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112560#c13

> Write good email explanations, and a good proposed commit message.
> Please.  It is the only one people can judge a patch.  Well, apart
> from doing everything myself from first principles, ignoring everything
> you said, just looking at the patch itself, but that is a hundred times
> more work.  I don't do that.
>
> > The
> > part that the patch touches has several wrong assumptions, and the
> > fixed "???" comment just emphasizes that. I don't see what is wrong
> > with:
> >
> > (define_insn "@pushfl2"
> >   [(set (match_operand:W 0 "push_operand" "=<")
> > (unspec:W [(match_operand 1 "flags_reg_operand")]
> >   UNSPEC_PUSHFL))]
> >   "GET_MODE_CLASS (GET_MODE (operands[1])) == MODE_CC"
> >   "pushf{}"
> >   [(set_attr "type" "push")
> >(set_attr "mode" "")])
>
> What does it even mean?  What is a flags:CC?  You always always always
> need to say what is *in* the flags, if you want to use it as input
> (which is what unspec does).  CC is weird like this.  Most targets do
> not have distinct physical flags for every condition, only a few
> conditions are "alive" at any point in the program!

>From our previous discussion, we concluded that "use" means
cc-compared-to-0, but we also need a "copy" operation, to be able to
move CC reg around as a physical register (e.g. sahf, lahf, pushfl,
popfl instructions). This is a register that contains the state of the
CPU, described in [1] , not some RTL concept. The register is even
listed in i386.md:

(FLAGS_REG   17)

with the "mode" that defines the value in the register more precisely.

[1] https://en.wikipedia.org/wiki/FLAGS_register

>
> > it is just a push of the flags reg to the stack. If the push can't be
> > described in this way, then it is the middle end at fault, we can't
> > just change modes at will.
>
> But that is not what this describes: it operates on the flags register
> in some unspecified way, and pushes the result of *that* to the stack.

No, the "use" is defined as cc-compared-to-0. The above is a "copy"
operation, the register that holds the state of the CPU is pushed on
the stack (and can be later popped from the stack to reload the saved
state). The pushfl instruction does not use the register in the sense
that it examines its contents.

> (Stack pointer modification is not described here btw, should it be?  Is
> that magically implemented by the backend some way, via type=push
> perhaps?)

Please see gen_pushfl() in i386.cc that emits the pattern:

#(insn:TI 5 2 6 2 (set (mem:DI (pre_dec:DI (reg/f:DI 7 sp)) [0  S8 A8])
#(unspec:DI [
#(reg:CC 17 flags)
#] UNSPEC_PUSHFL)) "flags.c":3:10 70 {pushfldi2}
# (expr_list:REG_DEAD (reg:CC 17 flags)
#(nil)))
   pushfq  # 5 [c=4 l=1]  pushfldi2

Uros.


Re: Combine patch ping

2024-04-10 Thread Uros Bizjak
On Wed, Apr 10, 2024 at 7:56 PM Segher Boessenkool
 wrote:
>
> On Sun, Apr 07, 2024 at 08:31:38AM +0200, Uros Bizjak wrote:
> > If there are no further comments, I plan to commit the referred patch
> > to the mainline on Wednesday. The latest version can be considered an
> > obvious patch that solves certain oversight in the original
> > implementation.
>
> This is never okay.  You cannot commit a patch without approval, *ever*.
>
> That patch is also obvious -- obviously *wrong*, that is.  There are
> big assumptions everywhere in the compiler how a CC reg can be used.
> This violates that, as explained elsewhere.

Can you please elaborate what is wrong with this concrete patch. The
part that the patch touches has several wrong assumptions, and the
fixed "???" comment just emphasizes that. I don't see what is wrong
with:

(define_insn "@pushfl2"
  [(set (match_operand:W 0 "push_operand" "=<")
(unspec:W [(match_operand 1 "flags_reg_operand")]
  UNSPEC_PUSHFL))]
  "GET_MODE_CLASS (GET_MODE (operands[1])) == MODE_CC"
  "pushf{}"
  [(set_attr "type" "push")
   (set_attr "mode" "")])

it is just a push of the flags reg to the stack. If the push can't be
described in this way, then it is the middle end at fault, we can't
just change modes at will.

Feel free to revert the patch, I will unassign myself from the PR.

Uros.


Re: Combine patch ping

2024-04-07 Thread Uros Bizjak
On Mon, Apr 1, 2024 at 9:28 PM Uros Bizjak  wrote:

> I'd like to ping the
> https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647634.html
> PR112560 P1 patch.

If there are no further comments, I plan to commit the referred patch
to the mainline on Wednesday. The latest version can be considered an
obvious patch that solves certain oversight in the original
implementation.

Thanks,
Uros.


Re: [PATCH] x86: Use explicit shift count in double-precision shifts

2024-04-06 Thread Uros Bizjak
On Fri, Apr 5, 2024 at 5:56 PM H.J. Lu  wrote:
>
> Don't use implicit shift count in double-precision shifts in AT syntax
> since they aren't in Intel SDM.  Keep the 's' modifier for backward
> compatibility with inline asm statements.
>
> PR target/114590
> * config/i386/i386.md (x86_64_shld): Use explicit shift count in
> AT syntax.
> (x86_64_shld_ndd): Likewise.
> (x86_shld): Likewise.
> (x86_shld_ndd): Likewise.
> (x86_64_shrd): Likewise.
> (x86_64_shrd_ndd): Likewise.
> (x86_shrd): Likewise.
> (x86_shrd_ndd): Likewise.

OK.

Thanks,
Uros.

> ---
>  gcc/config/i386/i386.md | 16 
>  1 file changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index 6ac401154e4..bb2c72f3473 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -14503,7 +14503,7 @@ (define_insn "x86_64_shld"
>   (and:QI (match_dup 2) (const_int 63 0)))
> (clobber (reg:CC FLAGS_REG))]
>"TARGET_64BIT"
> -  "shld{q}\t{%s2%1, %0|%0, %1, %2}"
> +  "shld{q}\t{%2, %1, %0|%0, %1, %2}"
>[(set_attr "type" "ishift")
> (set_attr "prefix_0f" "1")
> (set_attr "mode" "DI")
> @@ -14524,7 +14524,7 @@ (define_insn "x86_64_shld_ndd"
>   (and:QI (match_dup 3) (const_int 63 0)))
> (clobber (reg:CC FLAGS_REG))]
>"TARGET_APX_NDD"
> -  "shld{q}\t{%s3%2, %1, %0|%0, %1, %2, %3}"
> +  "shld{q}\t{%3, %2, %1, %0|%0, %1, %2, %3}"
>[(set_attr "type" "ishift")
> (set_attr "mode" "DI")])
>
> @@ -14681,7 +14681,7 @@ (define_insn "x86_shld"
>   (and:QI (match_dup 2) (const_int 31 0)))
> (clobber (reg:CC FLAGS_REG))]
>""
> -  "shld{l}\t{%s2%1, %0|%0, %1, %2}"
> +  "shld{l}\t{%2, %1, %0|%0, %1, %2}"
>[(set_attr "type" "ishift")
> (set_attr "prefix_0f" "1")
> (set_attr "mode" "SI")
> @@ -14703,7 +14703,7 @@ (define_insn "x86_shld_ndd"
>   (and:QI (match_dup 3) (const_int 31 0)))
> (clobber (reg:CC FLAGS_REG))]
>"TARGET_APX_NDD"
> -  "shld{l}\t{%s3%2, %1, %0|%0, %1, %2, %3}"
> +  "shld{l}\t{%3, %2, %1, %0|%0, %1, %2, %3}"
>[(set_attr "type" "ishift")
> (set_attr "mode" "SI")])
>
> @@ -15792,7 +15792,7 @@ (define_insn "x86_64_shrd"
>   (and:QI (match_dup 2) (const_int 63 0)))
> (clobber (reg:CC FLAGS_REG))]
>"TARGET_64BIT"
> -  "shrd{q}\t{%s2%1, %0|%0, %1, %2}"
> +  "shrd{q}\t{%2, %1, %0|%0, %1, %2}"
>[(set_attr "type" "ishift")
> (set_attr "prefix_0f" "1")
> (set_attr "mode" "DI")
> @@ -15813,7 +15813,7 @@ (define_insn "x86_64_shrd_ndd"
>   (and:QI (match_dup 3) (const_int 63 0)))
> (clobber (reg:CC FLAGS_REG))]
>"TARGET_APX_NDD"
> -  "shrd{q}\t{%s3%2, %1, %0|%0, %1, %2, %3}"
> +  "shrd{q}\t{%3, %2, %1, %0|%0, %1, %2, %3}"
>[(set_attr "type" "ishift")
> (set_attr "mode" "DI")])
>
> @@ -15971,7 +15971,7 @@ (define_insn "x86_shrd"
>   (and:QI (match_dup 2) (const_int 31 0)))
> (clobber (reg:CC FLAGS_REG))]
>""
> -  "shrd{l}\t{%s2%1, %0|%0, %1, %2}"
> +  "shrd{l}\t{%2, %1, %0|%0, %1, %2}"
>[(set_attr "type" "ishift")
> (set_attr "prefix_0f" "1")
> (set_attr "mode" "SI")
> @@ -15993,7 +15993,7 @@ (define_insn "x86_shrd_ndd"
>   (and:QI (match_dup 3) (const_int 31 0)))
> (clobber (reg:CC FLAGS_REG))]
>"TARGET_APX_NDD"
> -  "shrd{l}\t{%s3%2, %1, %0|%0, %1, %2, %3}"
> +  "shrd{l}\t{%3, %2, %1, %0|%0, %1, %2, %3}"
>[(set_attr "type" "ishift")
> (set_attr "mode" "SI")])
>
> --
> 2.44.0
>


Re: [PATCH] x86: Define __APX_F__ for -mapxf

2024-04-04 Thread Uros Bizjak
On Thu, Apr 4, 2024 at 5:08 PM H.J. Lu  wrote:
>
> Define __APX_F__ when APX is enabled.
>
> gcc/
>
> PR target/114587
> * config/i386/i386-c.cc (ix86_target_macros_internal): Define
> __APX_F__ when APX is enabled.
>
> gcc/testsuite/
>
> PR target/114587
> * gcc.target/i386/apx-2.c: New test.

OK.

Thanks,
Uros.

> ---
>  gcc/config/i386/i386-c.cc | 2 ++
>  gcc/testsuite/gcc.target/i386/apx-2.c | 6 ++
>  2 files changed, 8 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/apx-2.c
>
> diff --git a/gcc/config/i386/i386-c.cc b/gcc/config/i386/i386-c.cc
> index 114908c7ec0..226d277676c 100644
> --- a/gcc/config/i386/i386-c.cc
> +++ b/gcc/config/i386/i386-c.cc
> @@ -749,6 +749,8 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag,
>  }
>if (isa_flag2 & OPTION_MASK_ISA2_AVX10_1_512)
>  def_or_undef (parse_in, "__AVX10_1_512__");
> +  if (isa_flag2 & OPTION_MASK_ISA2_APX_F)
> +def_or_undef (parse_in, "__APX_F__");
>if (TARGET_IAMCU)
>  {
>def_or_undef (parse_in, "__iamcu");
> diff --git a/gcc/testsuite/gcc.target/i386/apx-2.c 
> b/gcc/testsuite/gcc.target/i386/apx-2.c
> new file mode 100644
> index 000..2f6439e4b23
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/apx-2.c
> @@ -0,0 +1,6 @@
> +/* { dg-do compile { target { ! ia32 } } } */
> +/* { dg-options "-mapxf" } */
> +
> +#ifndef __APX_F__
> +# error __APX_F__ not defined
> +#endif
> --
> 2.44.0
>


Combine patch ping

2024-04-01 Thread Uros Bizjak
Hello!

I'd like to ping the
https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647634.html
PR112560 P1 patch.

Thanks,
Uros.


Re: [PATCH] testsuite: Fix up ext-floating{3,12}.C on i686-linux

2024-03-27 Thread Uros Bizjak
On Wed, Mar 27, 2024 at 11:48 AM Jakub Jelinek  wrote:
>
> Hi!
>
> These tests FAIL for quite a while on i686-linux since July last year,
> likely r14-2628 .  Since that patch gcc claims _Float16 and __bf16
> support even without -msse2 because some functions could be using
> target attribute.
> Later r14-2691 added -msse2 to add_options_for_float16, but didn't do that
> for bfloat16, plus ext-floating{3,12}.C tests need the added dg-add-options,
> so that float16 and bfloat16 effective targets match the __STDCPP_FLOAT16_T__
> or __STDCPP_BFLOAT16_T__ macros.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, fixes
> -FAIL: g++.dg/cpp23/ext-floating12.C  -std=gnu++23  (test for errors, line 
> 144)
> -FAIL: g++.dg/cpp23/ext-floating12.C  -std=gnu++23  (test for errors, line 
> 146)
> -FAIL: g++.dg/cpp23/ext-floating12.C  -std=gnu++23  (test for errors, line 
> 148)
> -FAIL: g++.dg/cpp23/ext-floating12.C  -std=gnu++23  (test for errors, line 
> 150)
> -FAIL: g++.dg/cpp23/ext-floating12.C  -std=gnu++23  (test for errors, line 
> 152)
> -FAIL: g++.dg/cpp23/ext-floating12.C  -std=gnu++23  (test for errors, line 
> 154)
> -FAIL: g++.dg/cpp23/ext-floating12.C  -std=gnu++26  (test for errors, line 
> 144)
> -FAIL: g++.dg/cpp23/ext-floating12.C  -std=gnu++26  (test for errors, line 
> 146)
> -FAIL: g++.dg/cpp23/ext-floating12.C  -std=gnu++26  (test for errors, line 
> 148)
> -FAIL: g++.dg/cpp23/ext-floating12.C  -std=gnu++26  (test for errors, line 
> 150)
> -FAIL: g++.dg/cpp23/ext-floating12.C  -std=gnu++26  (test for errors, line 
> 152)
> -FAIL: g++.dg/cpp23/ext-floating12.C  -std=gnu++26  (test for errors, line 
> 154)
> -FAIL: g++.dg/cpp23/ext-floating3.C  -std=gnu++23  (test for errors, line 107)
> -FAIL: g++.dg/cpp23/ext-floating3.C  -std=gnu++23  (test for errors, line 114)
> -FAIL: g++.dg/cpp23/ext-floating3.C  -std=gnu++23  (test for errors, line 126)
> -FAIL: g++.dg/cpp23/ext-floating3.C  -std=gnu++23  (test for errors, line 79)
> -FAIL: g++.dg/cpp23/ext-floating3.C  -std=gnu++23  (test for errors, line 86)
> -FAIL: g++.dg/cpp23/ext-floating3.C  -std=gnu++23  (test for errors, line 98)
> -FAIL: g++.dg/cpp23/ext-floating3.C  -std=gnu++23  (test for warnings, line 
> 22)
> -FAIL: g++.dg/cpp23/ext-floating3.C  -std=gnu++23  (test for warnings, line 
> 23)
> -FAIL: g++.dg/cpp23/ext-floating3.C  -std=gnu++23  (test for warnings, line 
> 24)
> -FAIL: g++.dg/cpp23/ext-floating3.C  -std=gnu++23  (test for warnings, line 
> 25)
> -FAIL: g++.dg/cpp23/ext-floating3.C  -std=gnu++26  (test for errors, line 107)
> -FAIL: g++.dg/cpp23/ext-floating3.C  -std=gnu++26  (test for errors, line 114)
> -FAIL: g++.dg/cpp23/ext-floating3.C  -std=gnu++26  (test for errors, line 126)
> -FAIL: g++.dg/cpp23/ext-floating3.C  -std=gnu++26  (test for errors, line 79)
> -FAIL: g++.dg/cpp23/ext-floating3.C  -std=gnu++26  (test for errors, line 86)
> -FAIL: g++.dg/cpp23/ext-floating3.C  -std=gnu++26  (test for errors, line 98)
> -FAIL: g++.dg/cpp23/ext-floating3.C  -std=gnu++26  (test for warnings, line 
> 22)
> -FAIL: g++.dg/cpp23/ext-floating3.C  -std=gnu++26  (test for warnings, line 
> 23)
> -FAIL: g++.dg/cpp23/ext-floating3.C  -std=gnu++26  (test for warnings, line 
> 24)
> -FAIL: g++.dg/cpp23/ext-floating3.C  -std=gnu++26  (test for warnings, line 
> 25)
> on the latter and changes nothing on the former, ok for trunk?
>
> 2024-03-26  Jakub Jelinek  
>
> * lib/target-supports.exp (add_options_for_bfloat16): Add -msse2 on
> i?86/x86_64.
> * g++.dg/cpp23/ext-floating3.C: Add dg-add-options float16.
> * g++.dg/cpp23/ext-floating12.C: Add dg-add-options float16 and
> bfloat16.

OK.

Thanks,
Uros.

> --- gcc/testsuite/lib/target-supports.exp.jj2024-03-19 08:55:18.500791497 
> +0100
> +++ gcc/testsuite/lib/target-supports.exp   2024-03-26 20:30:41.963222438 
> +0100
> @@ -3829,6 +3829,9 @@ proc check_effective_target_bfloat16_run
>  }
>
>  proc add_options_for_bfloat16 { flags } {
> +if { [istarget i?86-*-*] || [istarget x86_64-*-*] } {
> +   return "$flags -msse2"
> +}
>  return "$flags"
>  }
>
> --- gcc/testsuite/g++.dg/cpp23/ext-floating3.C.jj   2022-09-27 
> 08:03:27.0 +0200
> +++ gcc/testsuite/g++.dg/cpp23/ext-floating3.C  2024-03-26 20:26:41.921609624 
> +0100
> @@ -4,6 +4,7 @@
>  // And some further tests.
>  // { dg-do compile { target { c++23 && { i?86-*-linux* x86_64-*-linux* } } } 
> }
>  // { dg-options "" }
> +// { dg-add-options float16 }
>
>  #include "ext-floating.h"
>
> --- gcc/testsuite/g++.dg/cpp23/ext-floating12.C.jj  2022-10-31 
> 20:15:49.72032 +0100
> +++ gcc/testsuite/g++.dg/cpp23/ext-floating12.C 2024-03-26 20:31:29.876546341 
> +0100
> @@ -1,6 +1,8 @@
>  // P1467R9 - Extended floating-point types and standard names.
>  // { dg-do compile { target { c++23 && { i?86-*-linux* x86_64-*-linux* } } } 
> }
>  // { dg-options "" }
> +// { dg-add-options float16 }
> +// { dg-add-options bfloat16 }
>
>  #include 
>  #include 
>
>   

Re: [PATCH] testsuite: i386: Skip gcc.target/i386/avx512cd-vpbroadcastmb2q-2.c etc. with Solaris as [PR114150]

2024-03-21 Thread Uros Bizjak
On Thu, Mar 21, 2024 at 10:26 AM Rainer Orth
 wrote:
>
> Two avx512cd tests FAIL to assemble with the Solaris/x86 assembler:
>
> FAIL: gcc.target/i386/avx512cd-vpbroadcastmb2q-2.c (test for excess errors)
> UNRESOLVED: gcc.target/i386/avx512cd-vpbroadcastmb2q-2.c compilation failed 
> to produce executable
> FAIL: gcc.target/i386/avx512cd-vpbroadcastmw2d-2.c (test for excess errors)
> UNRESOLVED: gcc.target/i386/avx512cd-vpbroadcastmw2d-2.c compilation failed 
> to produce executable
>
> Excess errors:
> Assembler: avx512cd-vpbroadcastmb2q-2.c
> "/var/tmp//ccs_9lod.s", line 42 : Invalid instruction argument
> Near line: "vpbroadcastmb2q %k0, %zmm0"
>
> Assembler: avx512cd-vpbroadcastmw2d-2.c
> "/var/tmp//ccevT6Rd.s", line 35 : Invalid instruction argument
> Near line: "vpbroadcastmw2d %k0, %zmm0"
>
> This seems to be an as bug, but given that this rarely if ever gets any
> fixes these days, this test just skips the affected tests.
>
> Adjuststing check_effective_target_avx512cd instead doesn't seem
> sensible since it would disable quite a number of working tests.
>
> Tested on i386-pc-solaris2.11 (as and gas) and x86_64-pc-linux-gnu.
>
> Ok for trunk?

OK, looks obvious to me.

Thanks,
Uros.

>
> Rainer
>
> --
> -
> Rainer Orth, Center for Biotechnology, Bielefeld University
>
>
> 2024-03-19  Rainer Orth  
>
> gcc/testsuite:
> PR target/114150
> * gcc.target/i386/avx512cd-vpbroadcastmb2q-2.c: Skip on
> Solaris/x86 with as.
> * gcc.target/i386/avx512cd-vpbroadcastmw2d-2.c: Likewise.
>


[PATCH] i386: Unify {general, timode}_scalar_chain::convert_op [PR111822]

2024-03-18 Thread Uros Bizjak
Recent PR111822 fix implemented REG_EH_REGION note copying to a STV converted
preload instruction in general_scalar_chain::convert_op.  However, the same
issue remains in timode_scalar_chain::convert_op.  Instead of copying the
newly introduced code to timode_scalar_chain::convert_op, the patch unifies
both functions to a common function.

PR target/111822

gcc/ChangeLog:

* config/i386/i386-features.cc (smode_convert_cst): New function
to handle SImode, DImode and TImode immediates, generalized from
timode_convert_cst.
(timode_convert_cst): Remove.
(scalar_chain::convert_op): Unify from
general_scalar_chain::convert_op and timode_scalar_chain::convert_op.
(general_scalar_chain::convert_op): Remove.
(timode_scalar_chain::convert_op): Remove.
(timode_scalar_chain::convert_insn): Update the call to
renamed timode_convert_cst.
* config/i386/i386-features.h (class scalar_chain):
Redeclare convert_op as protected class member.
(class general_calar_chain): Remove convert_op.
(class timode_scalar_chain): Ditto.

gcc/testsuite/ChangeLog:

* g++.target/i386/pr111822.C (dg-do): Compile only for ia32 targets.
(dg-options): Add -march=x86-64.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Pushed to master, will be pushed to gcc-12 and gcc-13 branch.

Uros.
diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-features.cc
index c7d7a965901..e3e004d5526 100644
--- a/gcc/config/i386/i386-features.cc
+++ b/gcc/config/i386/i386-features.cc
@@ -980,14 +980,35 @@ scalar_chain::convert_reg (rtx_insn *insn, rtx dst, rtx 
src)
 REGNO (src), REGNO (dst), INSN_UID (insn));
 }
 
+/* Helper function to convert immediate constant X to vmode.  */
+static rtx
+smode_convert_cst (rtx x, enum machine_mode vmode)
+{
+  /* Prefer all ones vector in case of -1.  */
+  if (constm1_operand (x, GET_MODE (x)))
+return CONSTM1_RTX (vmode);
+
+  unsigned n = GET_MODE_NUNITS (vmode);
+  rtx *v = XALLOCAVEC (rtx, n);
+  v[0] = x;
+  for (unsigned i = 1; i < n; ++i)
+v[i] = const0_rtx;
+  return gen_rtx_CONST_VECTOR (vmode, gen_rtvec_v (n, v));
+}
+
 /* Convert operand OP in INSN.  We should handle
memory operands and uninitialized registers.
All other register uses are converted during
registers conversion.  */
 
 void
-general_scalar_chain::convert_op (rtx *op, rtx_insn *insn)
+scalar_chain::convert_op (rtx *op, rtx_insn *insn)
 {
+  rtx tmp;
+
+  if (GET_MODE (*op) == V1TImode)
+return;
+
   *op = copy_rtx_if_shared (*op);
 
   if (GET_CODE (*op) == NOT
@@ -998,20 +1019,21 @@ general_scalar_chain::convert_op (rtx *op, rtx_insn 
*insn)
 }
   else if (MEM_P (*op))
 {
-  rtx_insn* eh_insn, *movabs = NULL;
-  rtx tmp = gen_reg_rtx (GET_MODE (*op));
+  rtx_insn *movabs = NULL;
 
   /* Emit MOVABS to load from a 64-bit absolute address to a GPR.  */
   if (!memory_operand (*op, GET_MODE (*op)))
{
- rtx tmp2 = gen_reg_rtx (GET_MODE (*op));
- movabs = emit_insn_before (gen_rtx_SET (tmp2, *op), insn);
+ tmp = gen_reg_rtx (GET_MODE (*op));
+ movabs = emit_insn_before (gen_rtx_SET (tmp, *op), insn);
 
- *op = tmp2;
+ *op = tmp;
}
 
-  eh_insn
-   = emit_insn_before (gen_rtx_SET (gen_rtx_SUBREG (vmode, tmp, 0),
+  tmp = gen_rtx_SUBREG (vmode, gen_reg_rtx (GET_MODE (*op)), 0);
+
+  rtx_insn *eh_insn
+   = emit_insn_before (gen_rtx_SET (copy_rtx (tmp),
 gen_gpr_to_xmm_move_src (vmode, *op)),
insn);
 
@@ -1028,33 +1050,17 @@ general_scalar_chain::convert_op (rtx *op, rtx_insn 
*insn)
}
}
 
-  *op = gen_rtx_SUBREG (vmode, tmp, 0);
+  *op = tmp;
 
   if (dump_file)
fprintf (dump_file, "  Preloading operand for insn %d into r%d\n",
 INSN_UID (insn), REGNO (tmp));
 }
   else if (REG_P (*op))
+*op = gen_rtx_SUBREG (vmode, *op, 0);
+  else if (CONST_SCALAR_INT_P (*op))
 {
-  *op = gen_rtx_SUBREG (vmode, *op, 0);
-}
-  else if (CONST_INT_P (*op))
-{
-  rtx vec_cst;
-  rtx tmp = gen_rtx_SUBREG (vmode, gen_reg_rtx (smode), 0);
-
-  /* Prefer all ones vector in case of -1.  */
-  if (constm1_operand (*op, GET_MODE (*op)))
-   vec_cst = CONSTM1_RTX (vmode);
-  else
-   {
- unsigned n = GET_MODE_NUNITS (vmode);
- rtx *v = XALLOCAVEC (rtx, n);
- v[0] = *op;
- for (unsigned i = 1; i < n; ++i)
-   v[i] = const0_rtx;
- vec_cst = gen_rtx_CONST_VECTOR (vmode, gen_rtvec_v (n, v));
-   }
+  rtx vec_cst = smode_convert_cst (*op, vmode);
 
   if (!standard_sse_constant_p (vec_cst, vmode))
{
@@ -1065,6 +1071,8 @@ general_scalar_chain::convert_op (rtx *op, rtx_insn *insn)
  emit_insn_before (seq, insn);
}
 
+  tmp = gen_rtx_SUBREG (vmode, gen_reg_rtx (smode), 0);
+
   

Re: [PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]

2024-03-18 Thread Uros Bizjak
On Mon, Mar 18, 2024 at 3:51 PM Segher Boessenkool
 wrote:
>
> On Thu, Mar 07, 2024 at 11:46:54PM +0100, Uros Bizjak wrote:
> > > Can't you just describe the dataflow then, without an unspec?  An unspec
> > > by definition does some (unspecified) operation on the data.
> >
> > Previously, it was defined as:
> >
> >  (define_insn "*pushfl2"
> >[(set (match_operand:W 0 "push_operand" "=<")
> >  (match_operand:W 1 "flags_reg_operand"))]
> >
> > But Wmode, AKA SI/DImode is not CCmode. And as said in my last
> > message, nothing prevents current source code to try to update the CC
> > register here.
>
> So you can use an unspec just to convert the flags reg to an integer?
> To make an integer representation of flags reg contents.

Yes, this is correct. But please note the v3 patch, where the mode
update is made at the correct location. Quote from the patch:

Replace cc_use_loc with the entire new RTX only in case cc_use_loc satisfies
COMPARISON_P predicate.  Otherwise scan the entire cc_use_loc RTX for CC reg
to be updated with a new mode.

> Or is that what we started with here?

The reason for the patch is that when CC reg is used outside
comparison RTX, the combine tries to update CC reg mode where it is
used after the combined instruction. This happens on extremely rare
occasions, but when it happens, combine assumes that it is used
exclusively in the comparison RTX and uses "SUBST (XEXP (*cc_use_loc,
0), ...);". XEXP (*cc_use_loc, 0) will segfault when CC reg is
referred in a simple SET assignment, not only when it is referred in
an UNSPEC. Please note that the comparison RTX is handled a few lines
above, where my patch also fixes the "???" issue.

Uros.


Re: [PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]

2024-03-18 Thread Uros Bizjak
On Mon, Mar 18, 2024 at 3:46 PM Segher Boessenkool
 wrote:
>
> On Thu, Mar 07, 2024 at 11:27:28PM +0100, Uros Bizjak wrote:
> > On Thu, Mar 7, 2024 at 11:07 PM Uros Bizjak  wrote:
> > > > >  (unspec:DI [
> > > > >  (reg:CC 17 flags)
> > > > >  ] UNSPEC_PUSHFL)
> > > >
> > > > But that is invalid RTL?  The only valid use of a CC is written as
> > > > cc-compared-to-0.  It means "the result of something compared to 0,
> > > > stored in that cc reg".
> > > >
> > > > (And you can copy a CC reg around, but that is not a use ;-) )
> >
> > Hm... under this premise, we can also say that any form that is not
> > cc-compared-to-0 is not a use.
>
> Well, no.  All uses of CC are written as comparisons to 0, or are pure
> dataflow.  Anything else is not "not a use" but just invalid.
>
> > Consequently, if it is not a use, then
> > the CC reg should not be updated at its use location, so my v1 patch,
> > where we simply skip the update (but retain the combined insn),
> > actually makes sense.
>
> With more asserts, perhaps.
>
> > In this concrete situation, we don't care about CC register mode in
> > the PUSHFL insn. And we should not change CC reg mode of the use,
> > because any other mode than the generic CCmode won't be recognized by
> > the insn pattern.
>
> You always care about the CC mode, that is why you always write it as
> comparison, so the backend can choose a mode based on what the flag bits
> mean in this context.  For an extreme example look at 390, but on pretty
> much any target both signed and unsigned comparisons use the same flag
> bits, and maybe fp comparisons as well.

After some more thoughts, I came up with v3 [1], where the mode is
updated also in non-comparison use. But instead of a blind SUBST, we
find the correct use location and update the mode accordingly. If the
new mode is not recognized in the use insn, then the whole combination
is cancelled. Since x86 pushfl does not care about CCmode, I have
changed it to handle all CCmodes. Please note, that with this
approach, the backend can choose a mode.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2024-March/647634.html

> But pushfl does sound like pure dataflow.  Why is this a builtin, is
> it ever a good idea if the user writes stuff the compiler can do a
> better job doing itself, not to mention it is way easier for the
> compiler anyway?  :-)

The mode of the pushfl has to be correct, because it operates on
stack. Unfortunately, the plain move can't do this due to
WORDmode/CCmode mismatch in the rtx, so UNSPEC is necessary. But it IS
a pure dataflow instruction - it is a PUSH.

Uros.


Re: [PATCH] i386 [stv]: Handle REG_EH_REGION note [pr111822].

2024-03-18 Thread Uros Bizjak
On Mon, Mar 18, 2024 at 11:52 AM liuhongt  wrote:
>
> Commit r14-9459-g618e34d56cc38e only handles
> general_scalar_chain::convert_op. The patch also handles
> timode_scalar_chain::convert_op to avoid potential similar bug.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk and backport to releases/gcc-13 branch?

I have the following patch in testing that merges
{general,timode}_scalar_chain::convert_op, so in addition to less code
duplication, it will fix the issue for both chains. WDYT?

Uros.

>
> gcc/ChangeLog:
>
> PR target/111822
> * config/i386/i386-features.cc
> (timode_scalar_chain::convert_op): Handle REG_EH_REGION note.
> ---
>  gcc/config/i386/i386-features.cc | 20 +---
>  1 file changed, 17 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/config/i386/i386-features.cc 
> b/gcc/config/i386/i386-features.cc
> index c7d7a965901..38f57d96df5 100644
> --- a/gcc/config/i386/i386-features.cc
> +++ b/gcc/config/i386/i386-features.cc
> @@ -1794,12 +1794,26 @@ timode_scalar_chain::convert_op (rtx *op, rtx_insn 
> *insn)
>  *op = gen_rtx_SUBREG (V1TImode, *op, 0);
>else if (MEM_P (*op))
>  {
> +  rtx_insn* eh_insn;
>rtx tmp = gen_reg_rtx (V1TImode);
> -  emit_insn_before (gen_rtx_SET (tmp,
> -gen_gpr_to_xmm_move_src (V1TImode, *op)),
> -   insn);
> +  eh_insn
> +   = emit_insn_before (gen_rtx_SET (tmp,
> +gen_gpr_to_xmm_move_src (V1TImode,
> + *op)),
> +   insn);
>*op = tmp;
>
> +  if (cfun->can_throw_non_call_exceptions)
> +   {
> + /* Handle REG_EH_REGION note.  */
> + rtx note = find_reg_note (insn, REG_EH_REGION, NULL_RTX);
> + if (note)
> +   {
> + control_flow_insns.safe_push (eh_insn);
> + add_reg_note (eh_insn, REG_EH_REGION, XEXP (note, 0));
> +   }
> +   }
> +
>if (dump_file)
> fprintf (dump_file, "  Preloading operand for insn %d into r%d\n",
>  INSN_UID (insn), REGNO (tmp));
> --
> 2.31.1
>
diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-features.cc
index c7d7a965901..6d7ef28e4b1 100644
--- a/gcc/config/i386/i386-features.cc
+++ b/gcc/config/i386/i386-features.cc
@@ -980,14 +980,36 @@ scalar_chain::convert_reg (rtx_insn *insn, rtx dst, rtx 
src)
 REGNO (src), REGNO (dst), INSN_UID (insn));
 }
 
+
+/* Helper function to convert immediate constant X to vmode.  */
+static rtx
+smode_convert_cst (rtx x, enum machine_mode vmode)
+{
+  /* Prefer all ones vector in case of -1.  */
+  if (constm1_operand (x, GET_MODE (x)))
+return  CONSTM1_RTX (vmode);
+
+  unsigned n = GET_MODE_NUNITS (vmode);
+  rtx *v = XALLOCAVEC (rtx, n);
+  v[0] = x;
+  for (unsigned i = 1; i < n; ++i)
+v[i] = const0_rtx;
+  return gen_rtx_CONST_VECTOR (vmode, gen_rtvec_v (n, v));
+}
+
 /* Convert operand OP in INSN.  We should handle
memory operands and uninitialized registers.
All other register uses are converted during
registers conversion.  */
 
 void
-general_scalar_chain::convert_op (rtx *op, rtx_insn *insn)
+scalar_chain::convert_op (rtx *op, rtx_insn *insn)
 {
+  rtx tmp;
+
+  if (GET_MODE (*op) == V1TImode)
+return;
+
   *op = copy_rtx_if_shared (*op);
 
   if (GET_CODE (*op) == NOT
@@ -998,20 +1020,21 @@ general_scalar_chain::convert_op (rtx *op, rtx_insn 
*insn)
 }
   else if (MEM_P (*op))
 {
-  rtx_insn* eh_insn, *movabs = NULL;
-  rtx tmp = gen_reg_rtx (GET_MODE (*op));
+  rtx_insn *movabs = NULL;
 
   /* Emit MOVABS to load from a 64-bit absolute address to a GPR.  */
   if (!memory_operand (*op, GET_MODE (*op)))
{
- rtx tmp2 = gen_reg_rtx (GET_MODE (*op));
- movabs = emit_insn_before (gen_rtx_SET (tmp2, *op), insn);
+ tmp = gen_reg_rtx (GET_MODE (*op));
+ movabs = emit_insn_before (gen_rtx_SET (tmp, *op), insn);
 
- *op = tmp2;
+ *op = tmp;
}
 
-  eh_insn
-   = emit_insn_before (gen_rtx_SET (gen_rtx_SUBREG (vmode, tmp, 0),
+  tmp = gen_rtx_SUBREG (vmode, gen_reg_rtx (GET_MODE (*op)), 0);
+
+  rtx_insn *eh_insn
+   = emit_insn_before (gen_rtx_SET (copy_rtx (tmp),
 gen_gpr_to_xmm_move_src (vmode, *op)),
insn);
 
@@ -1028,33 +1051,18 @@ general_scalar_chain::convert_op (rtx *op, rtx_insn 
*insn)
}
}
 
-  *op = gen_rtx_SUBREG (vmode, tmp, 0);
+  *op = tmp;
 
   if (dump_file)
fprintf (dump_file, "  Preloading operand for insn %d into r%d\n",
 INSN_UID (insn), REGNO (tmp));
 }
   else if (REG_P (*op))
+*op = gen_rtx_SUBREG (vmode, *op, 0);
+  else if (CONST_SCALAR_INT_P (*op))
 {
-  *op = gen_rtx_SUBREG (vmode, *op, 

Re: [PATCH] i386: Fix a pasto in ix86_expand_int_sse_cmp [PR114339]

2024-03-15 Thread Uros Bizjak
On Fri, Mar 15, 2024 at 9:50 AM Jakub Jelinek  wrote:
>
> Hi!
>
> In r13-3803-gfa271afb58 I've added an optimization for LE/LEU/GE/GEU
> comparison against CONST_VECTOR.  As the comments say:
>  /* x <= cst can be handled as x < cst + 1 unless there is
> wrap around in cst + 1.  */
> ...
>  /* For LE punt if some element is signed maximum.  */
> ...
>  /* For LEU punt if some element is unsigned maximum.  */
> and
>  /* x >= cst can be handled as x > cst - 1 unless there is
> wrap around in cst - 1.  */
> ...
>  /* For GE punt if some element is signed minimum.  */
> ...
>  /* For GEU punt if some element is zero.  */
> Apparently I wrote the GE/GEU (second case) first and then
> copied/adjusted it for LE/LEU, most of the adjustments look correct, but
> I've left if (code == GE) comparison when testing if it should punt for
> signed maximum.  That condition is never true, because this is in
> switch (code) { ... case LE: case LEU: block and we really meant to
> be what the comment says, for LE punt if some element is signed maximum,
> as then cst + 1 wraps around.
>
> The following patch fixes the pasto.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2024-03-15  Jakub Jelinek  
>
> PR target/114339
> * config/i386/i386-expand.cc (ix86_expand_int_sse_cmp) : Fix
> a pasto, compare code against LE rather than GE.
>
> * gcc.target/i386/pr114339.c: New test.

OK.

Thanks,
Uros.

>
> --- gcc/config/i386/i386-expand.cc.jj   2024-03-07 08:34:21.043802912 +0100
> +++ gcc/config/i386/i386-expand.cc  2024-03-14 22:55:57.321842686 +0100
> @@ -4690,7 +4690,7 @@ ix86_expand_int_sse_cmp (rtx dest, enum
>   rtx elt = CONST_VECTOR_ELT (cop1, i);
>   if (!CONST_INT_P (elt))
> break;
> - if (code == GE)
> + if (code == LE)
> {
>   /* For LE punt if some element is signed maximum.  */
>   if ((INTVAL (elt) & (GET_MODE_MASK (eltmode) >> 1))
> --- gcc/testsuite/gcc.target/i386/pr114339.c.jj 2024-03-14 22:58:04.739076025 
> +0100
> +++ gcc/testsuite/gcc.target/i386/pr114339.c2024-03-14 22:38:59.736972124 
> +0100
> @@ -0,0 +1,20 @@
> +/* PR target/114339 */
> +/* { dg-do run } */
> +/* { dg-options "-O2 -Wno-psabi" } */
> +/* { dg-additional-options "-mavx" { target avx_runtime } } */
> +
> +typedef long long V __attribute__((vector_size (16)));
> +
> +__attribute__((noipa)) V
> +foo (V a)
> +{
> +  return a <= (V) {0, __LONG_LONG_MAX__ };
> +}
> +
> +int
> +main ()
> +{
> +  V t = foo ((V) { 0, 0 });
> +  if (t[0] != -1LL || t[1] != -1LL)
> +__builtin_abort ();
> +}
>
> Jakub
>


Re: [PATCH] i386[stv]: Handle REG_EH_REGION note

2024-03-14 Thread Uros Bizjak
On Thu, Mar 14, 2024 at 8:42 AM Uros Bizjak  wrote:
>
> On Thu, Mar 14, 2024 at 8:32 AM Hongtao Liu  wrote:
> >
> > On Thu, Mar 14, 2024 at 3:22 PM Uros Bizjak  wrote:
> > >
> > > On Thu, Mar 14, 2024 at 2:33 AM liuhongt  wrote:
> > > >
> > > > When we split
> > > > (insn 37 36 38 10 (set (reg:DI 104 [ _18 ])
> > > > (mem:DI (reg/f:SI 98 [ CallNative_nclosure.0_1 ]) [6 
> > > > MEM[(struct SQRefCounted *)CallNative_nclosure.0_1]._uiRef+0 S8 A32])) 
> > > > "test.C":22:42 84 {*movdi_internal}
> > > >  (expr_list:REG_EH_REGION (const_int -11 [0xfff5])
> > > >
> > > > into
> > > >
> > > > (insn 104 36 37 10 (set (subreg:V2DI (reg:DI 124) 0)
> > > > (vec_concat:V2DI (mem:DI (reg/f:SI 98 [ CallNative_nclosure.0_1 
> > > > ]) [6 MEM[(struct SQRefCounted *)CallNative_nclosure.0_1]._uiRef+0 S8 
> > > > A32])
> > > > (const_int 0 [0]))) "test.C":22:42 -1
> > > > (nil)))
> > > > (insn 37 104 105 10 (set (subreg:V2DI (reg:DI 104 [ _18 ]) 0)
> > > > (subreg:V2DI (reg:DI 124) 0)) "test.C":22:42 2024 
> > > > {movv2di_internal}
> > > >  (expr_list:REG_EH_REGION (const_int -11 [0xfff5])
> > > > (nil)))
> > > >
> > > > we must copy the REG_EH_REGION note to the first insn and split the 
> > > > block
> > > > after the newly added insn.  The REG_EH_REGION on the second insn will 
> > > > be
> > > > removed later since it no longer traps.
> > > >
> > > > Currently we only handle memory_operand, are there any other insns
> > > > need to be handled???
> > >
> > > I think memory access is the only thing that can trap.
> > >
> > > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,} for trunk and 
> > > > gcc-13/gcc-12 release branch.
> > > > Ok for trunk and backport?
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > * config/i386/i386-features.cc
> > > > (general_scalar_chain::convert_op): Handle REG_EH_REGION note.
> > > > (convert_scalars_to_vector): Ditto.
> > > > * config/i386/i386-features.h (class scalar_chain): New
> > > > memeber control_flow_insns.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > * g++.target/i386/pr111822.C: New test.
> > > > ---
> > > >  gcc/config/i386/i386-features.cc | 48 ++--
> > > >  gcc/config/i386/i386-features.h  |  1 +
> > > >  gcc/testsuite/g++.target/i386/pr111822.C | 45 ++
> > > >  3 files changed, 90 insertions(+), 4 deletions(-)
> > > >  create mode 100644 gcc/testsuite/g++.target/i386/pr111822.C
> > > >
> > > > diff --git a/gcc/config/i386/i386-features.cc 
> > > > b/gcc/config/i386/i386-features.cc
> > > > index 1de2a07ed75..2ed27a9ebdd 100644
> > > > --- a/gcc/config/i386/i386-features.cc
> > > > +++ b/gcc/config/i386/i386-features.cc
> > > > @@ -998,20 +998,36 @@ general_scalar_chain::convert_op (rtx *op, 
> > > > rtx_insn *insn)
> > > >  }
> > > >else if (MEM_P (*op))
> > > >  {
> > > > +  rtx_insn* eh_insn, *movabs = NULL;
> > > >rtx tmp = gen_reg_rtx (GET_MODE (*op));
> > > >
> > > >/* Handle movabs.  */
> > > >if (!memory_operand (*op, GET_MODE (*op)))
> > > > {
> > > >   rtx tmp2 = gen_reg_rtx (GET_MODE (*op));
> > > > + movabs = emit_insn_before (gen_rtx_SET (tmp2, *op), insn);
> > > >
> > > > - emit_insn_before (gen_rtx_SET (tmp2, *op), insn);
> > > >   *op = tmp2;
> > > > }
> > >
> > > I may be missing something, but isn't the above a dead code? We have
> > > if (MEM_p(*op)) and then if (!memory_operand (*op, ...)).
> > It's PR91814 #c1, memory_operand will also check invalid memory addresses.
>
> Oh, it is even my comment ;)
>
> Perhaps the comment should be improved to something like:
>
> "Emit MOVABS to load from a 64-bit absolute address to a GPR."
>
> LGTM then.

BTW: Do we need to also fix timode_scalar_chain::convert_op ? There we
also preload operand, so a similar fix should be applied there.

Uros.


Re: [PATCH] i386[stv]: Handle REG_EH_REGION note

2024-03-14 Thread Uros Bizjak
On Thu, Mar 14, 2024 at 8:32 AM Hongtao Liu  wrote:
>
> On Thu, Mar 14, 2024 at 3:22 PM Uros Bizjak  wrote:
> >
> > On Thu, Mar 14, 2024 at 2:33 AM liuhongt  wrote:
> > >
> > > When we split
> > > (insn 37 36 38 10 (set (reg:DI 104 [ _18 ])
> > > (mem:DI (reg/f:SI 98 [ CallNative_nclosure.0_1 ]) [6 MEM[(struct 
> > > SQRefCounted *)CallNative_nclosure.0_1]._uiRef+0 S8 A32])) "test.C":22:42 
> > > 84 {*movdi_internal}
> > >  (expr_list:REG_EH_REGION (const_int -11 [0xfff5])
> > >
> > > into
> > >
> > > (insn 104 36 37 10 (set (subreg:V2DI (reg:DI 124) 0)
> > > (vec_concat:V2DI (mem:DI (reg/f:SI 98 [ CallNative_nclosure.0_1 
> > > ]) [6 MEM[(struct SQRefCounted *)CallNative_nclosure.0_1]._uiRef+0 S8 
> > > A32])
> > > (const_int 0 [0]))) "test.C":22:42 -1
> > > (nil)))
> > > (insn 37 104 105 10 (set (subreg:V2DI (reg:DI 104 [ _18 ]) 0)
> > > (subreg:V2DI (reg:DI 124) 0)) "test.C":22:42 2024 
> > > {movv2di_internal}
> > >  (expr_list:REG_EH_REGION (const_int -11 [0xfff5])
> > > (nil)))
> > >
> > > we must copy the REG_EH_REGION note to the first insn and split the block
> > > after the newly added insn.  The REG_EH_REGION on the second insn will be
> > > removed later since it no longer traps.
> > >
> > > Currently we only handle memory_operand, are there any other insns
> > > need to be handled???
> >
> > I think memory access is the only thing that can trap.
> >
> > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,} for trunk and 
> > > gcc-13/gcc-12 release branch.
> > > Ok for trunk and backport?
> > >
> > > gcc/ChangeLog:
> > >
> > > * config/i386/i386-features.cc
> > > (general_scalar_chain::convert_op): Handle REG_EH_REGION note.
> > > (convert_scalars_to_vector): Ditto.
> > > * config/i386/i386-features.h (class scalar_chain): New
> > > memeber control_flow_insns.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * g++.target/i386/pr111822.C: New test.
> > > ---
> > >  gcc/config/i386/i386-features.cc | 48 ++--
> > >  gcc/config/i386/i386-features.h  |  1 +
> > >  gcc/testsuite/g++.target/i386/pr111822.C | 45 ++
> > >  3 files changed, 90 insertions(+), 4 deletions(-)
> > >  create mode 100644 gcc/testsuite/g++.target/i386/pr111822.C
> > >
> > > diff --git a/gcc/config/i386/i386-features.cc 
> > > b/gcc/config/i386/i386-features.cc
> > > index 1de2a07ed75..2ed27a9ebdd 100644
> > > --- a/gcc/config/i386/i386-features.cc
> > > +++ b/gcc/config/i386/i386-features.cc
> > > @@ -998,20 +998,36 @@ general_scalar_chain::convert_op (rtx *op, rtx_insn 
> > > *insn)
> > >  }
> > >else if (MEM_P (*op))
> > >  {
> > > +  rtx_insn* eh_insn, *movabs = NULL;
> > >rtx tmp = gen_reg_rtx (GET_MODE (*op));
> > >
> > >/* Handle movabs.  */
> > >if (!memory_operand (*op, GET_MODE (*op)))
> > > {
> > >   rtx tmp2 = gen_reg_rtx (GET_MODE (*op));
> > > + movabs = emit_insn_before (gen_rtx_SET (tmp2, *op), insn);
> > >
> > > - emit_insn_before (gen_rtx_SET (tmp2, *op), insn);
> > >   *op = tmp2;
> > > }
> >
> > I may be missing something, but isn't the above a dead code? We have
> > if (MEM_p(*op)) and then if (!memory_operand (*op, ...)).
> It's PR91814 #c1, memory_operand will also check invalid memory addresses.

Oh, it is even my comment ;)

Perhaps the comment should be improved to something like:

"Emit MOVABS to load from a 64-bit absolute address to a GPR."

LGTM then.

Thanks,
Uros.

> >
> > Uros.
> >
> > >
> > > -  emit_insn_before (gen_rtx_SET (gen_rtx_SUBREG (vmode, tmp, 0),
> > > -gen_gpr_to_xmm_move_src (vmode, 
> > > *op)),
> > > -   insn);
> > > +  eh_insn
> > > +   = emit_insn_before (gen_rtx_SET (gen_rtx_SUBREG (vmode, tmp, 0),
> > > +gen_gpr_to_xmm_move_src (vmode, 
> > > *op)),
> > > +   insn);
> > > +
> > > +  if (cfun->

Re: [PATCH] i386[stv]: Handle REG_EH_REGION note

2024-03-14 Thread Uros Bizjak
On Thu, Mar 14, 2024 at 2:33 AM liuhongt  wrote:
>
> When we split
> (insn 37 36 38 10 (set (reg:DI 104 [ _18 ])
> (mem:DI (reg/f:SI 98 [ CallNative_nclosure.0_1 ]) [6 MEM[(struct 
> SQRefCounted *)CallNative_nclosure.0_1]._uiRef+0 S8 A32])) "test.C":22:42 84 
> {*movdi_internal}
>  (expr_list:REG_EH_REGION (const_int -11 [0xfff5])
>
> into
>
> (insn 104 36 37 10 (set (subreg:V2DI (reg:DI 124) 0)
> (vec_concat:V2DI (mem:DI (reg/f:SI 98 [ CallNative_nclosure.0_1 ]) [6 
> MEM[(struct SQRefCounted *)CallNative_nclosure.0_1]._uiRef+0 S8 A32])
> (const_int 0 [0]))) "test.C":22:42 -1
> (nil)))
> (insn 37 104 105 10 (set (subreg:V2DI (reg:DI 104 [ _18 ]) 0)
> (subreg:V2DI (reg:DI 124) 0)) "test.C":22:42 2024 {movv2di_internal}
>  (expr_list:REG_EH_REGION (const_int -11 [0xfff5])
> (nil)))
>
> we must copy the REG_EH_REGION note to the first insn and split the block
> after the newly added insn.  The REG_EH_REGION on the second insn will be
> removed later since it no longer traps.
>
> Currently we only handle memory_operand, are there any other insns
> need to be handled???

I think memory access is the only thing that can trap.

> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,} for trunk and 
> gcc-13/gcc-12 release branch.
> Ok for trunk and backport?
>
> gcc/ChangeLog:
>
> * config/i386/i386-features.cc
> (general_scalar_chain::convert_op): Handle REG_EH_REGION note.
> (convert_scalars_to_vector): Ditto.
> * config/i386/i386-features.h (class scalar_chain): New
> memeber control_flow_insns.
>
> gcc/testsuite/ChangeLog:
>
> * g++.target/i386/pr111822.C: New test.
> ---
>  gcc/config/i386/i386-features.cc | 48 ++--
>  gcc/config/i386/i386-features.h  |  1 +
>  gcc/testsuite/g++.target/i386/pr111822.C | 45 ++
>  3 files changed, 90 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/g++.target/i386/pr111822.C
>
> diff --git a/gcc/config/i386/i386-features.cc 
> b/gcc/config/i386/i386-features.cc
> index 1de2a07ed75..2ed27a9ebdd 100644
> --- a/gcc/config/i386/i386-features.cc
> +++ b/gcc/config/i386/i386-features.cc
> @@ -998,20 +998,36 @@ general_scalar_chain::convert_op (rtx *op, rtx_insn 
> *insn)
>  }
>else if (MEM_P (*op))
>  {
> +  rtx_insn* eh_insn, *movabs = NULL;
>rtx tmp = gen_reg_rtx (GET_MODE (*op));
>
>/* Handle movabs.  */
>if (!memory_operand (*op, GET_MODE (*op)))
> {
>   rtx tmp2 = gen_reg_rtx (GET_MODE (*op));
> + movabs = emit_insn_before (gen_rtx_SET (tmp2, *op), insn);
>
> - emit_insn_before (gen_rtx_SET (tmp2, *op), insn);
>   *op = tmp2;
> }

I may be missing something, but isn't the above a dead code? We have
if (MEM_p(*op)) and then if (!memory_operand (*op, ...)).

Uros.

>
> -  emit_insn_before (gen_rtx_SET (gen_rtx_SUBREG (vmode, tmp, 0),
> -gen_gpr_to_xmm_move_src (vmode, *op)),
> -   insn);
> +  eh_insn
> +   = emit_insn_before (gen_rtx_SET (gen_rtx_SUBREG (vmode, tmp, 0),
> +gen_gpr_to_xmm_move_src (vmode, 
> *op)),
> +   insn);
> +
> +  if (cfun->can_throw_non_call_exceptions)
> +   {
> + /* Handle REG_EH_REGION note.  */
> + rtx note = find_reg_note (insn, REG_EH_REGION, NULL_RTX);
> + if (note)
> +   {
> + if (movabs)
> +   eh_insn = movabs;
> + control_flow_insns.safe_push (eh_insn);
> + add_reg_note (eh_insn, REG_EH_REGION, XEXP (note, 0));
> +   }
> +   }
> +
>*op = gen_rtx_SUBREG (vmode, tmp, 0);
>
>if (dump_file)
> @@ -2494,6 +2510,7 @@ convert_scalars_to_vector (bool timode_p)
>  {
>basic_block bb;
>int converted_insns = 0;
> +  auto_vec control_flow_insns;
>
>bitmap_obstack_initialize (NULL);
>const machine_mode cand_mode[3] = { SImode, DImode, TImode };
> @@ -2575,6 +2592,11 @@ convert_scalars_to_vector (bool timode_p)
>  chain->chain_id);
> }
>
> + rtx_insn* iter_insn;
> + unsigned int ii;
> + FOR_EACH_VEC_ELT (chain->control_flow_insns, ii, iter_insn)
> +   control_flow_insns.safe_push (iter_insn);
> +
>   delete chain;
> }
>  }
> @@ -2643,6 +2665,24 @@ convert_scalars_to_vector (bool timode_p)
>   DECL_INCOMING_RTL (parm) = gen_rtx_SUBREG (TImode, r, 0);
>   }
>   }
> +
> +  if (!control_flow_insns.is_empty ())
> +   {
> + free_dominance_info (CDI_DOMINATORS);
> +
> + unsigned int i;
> + rtx_insn* insn;
> + FOR_EACH_VEC_ELT (control_flow_insns, i, insn)
> +   if (control_flow_insn_p (insn))
> + {
> +   /* Split the block 

Fwd: [PATCH v3] combine: Fix ICE in try_combine on pr112494.c [PR112560]

2024-03-12 Thread Uros Bizjak
Forgot to CC gcc-patches@ ML... sorry for the duplicate...

The compiler, configured with --enable-checking=yes,rtl,extra ICEs with:

internal compiler error: RTL check: expected elt 0 type 'e' or 'u',
have 'E' (rtx unspec) in try_combine, at combine.cc:3237

This is

3236  /* Just replace the CC reg with a new mode.  */
3237  SUBST (XEXP (*cc_use_loc, 0), newpat_dest);
3238  undobuf.other_insn = cc_use_insn;

in combine.cc, where *cc_use_loc is

(unspec:DI [
(reg:CC 17 flags)
] UNSPEC_PUSHFL)

combine assumes CC must be used inside of a comparison and uses XEXP (..., 0)
without checking on the RTX type of the argument.

Replace cc_use_loc with the entire new RTX only in case cc_use_loc satisfies
COMPARISON_P predicate.  Otherwise scan the entire cc_use_loc RTX for CC reg
to be updated with a new mode.

PR rtl-optimization/112560

gcc/ChangeLog:

* combine.cc (try_combine): Replace cc_use_loc with the entire
new RTX only in case cc_use_loc satisfies COMPARISON_P predicate.
Otherwise scan the entire cc_use_loc RTX for CC reg to be updated
with a new mode.
* config/i386/i386.md (@pushf2): Allow all CC modes for
operand 1.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

OK for trunk?

Uros.
diff --git a/gcc/combine.cc b/gcc/combine.cc
index a4479f8d836..92b8d98e6c1 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -3222,8 +3222,7 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
rtx_insn *i0,
 #endif
  /* Cases for modifying the CC-using comparison.  */
  if (compare_code != orig_compare_code
- /* ??? Do we need to verify the zero rtx?  */
- && XEXP (*cc_use_loc, 1) == const0_rtx)
+ && COMPARISON_P (*cc_use_loc))
{
  /* Replace cc_use_loc with entire new RTX.  */
  SUBST (*cc_use_loc,
@@ -3233,8 +3232,19 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
rtx_insn *i0,
}
  else if (compare_mode != orig_compare_mode)
{
+ subrtx_ptr_iterator::array_type array;
+
  /* Just replace the CC reg with a new mode.  */
- SUBST (XEXP (*cc_use_loc, 0), newpat_dest);
+ FOR_EACH_SUBRTX_PTR (iter, array, cc_use_loc, NONCONST)
+   {
+ rtx *loc = *iter;
+ if (REG_P (*loc)
+ && REGNO (*loc) == REGNO (newpat_dest))
+   {
+ SUBST (*loc, newpat_dest);
+ iter.skip_subrtxes ();
+   }
+   }
  undobuf.other_insn = cc_use_insn;
}
}
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index df97a2d6270..9dc33fd239a 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -2213,9 +2213,9 @@ (define_insn "*pop1_epilogue"
 
 (define_insn "@pushfl2"
   [(set (match_operand:W 0 "push_operand" "=<")
-   (unspec:W [(match_operand:CC 1 "flags_reg_operand")]
+   (unspec:W [(match_operand 1 "flags_reg_operand")]
  UNSPEC_PUSHFL))]
-  ""
+  "GET_MODE_CLASS (GET_MODE (operands[1])) == MODE_CC"
   "pushf{}"
   [(set_attr "type" "push")
(set_attr "mode" "")])


Re: [PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]

2024-03-07 Thread Uros Bizjak
On Thu, Mar 7, 2024 at 11:29 PM Segher Boessenkool
 wrote:
>
> On Thu, Mar 07, 2024 at 11:07:18PM +0100, Uros Bizjak wrote:
> > On Thu, Mar 7, 2024 at 10:37 PM Segher Boessenkool
> >  wrote:
> > > > but can be something else, such as the above noted
> > > >
> > > >  (unspec:DI [
> > > >  (reg:CC 17 flags)
> > > >  ] UNSPEC_PUSHFL)
> > >
> > > But that is invalid RTL?  The only valid use of a CC is written as
> > > cc-compared-to-0.  It means "the result of something compared to 0,
> > > stored in that cc reg".
> > >
> > > (And you can copy a CC reg around, but that is not a use ;-) )
> >
> > How can we describe a pushfl then?
>
> (unspec:DI [
> (compare:CC) ((reg:CC 17 flags) (const_int 0))
> ] UNSPEC_PUSHFL)
>
> or something like that?
>
> > It was changed to its current form
> > in [1], but I suspect that the code will try to update it even when
> > pushfl is implemented as a direct move from a register (as was defined
> > previously).
> >
> > BTW: Unspecs are used in a couple of other places for other targets [2].
> >
> > [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112494#c5
> > [2] https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639743.html
>
> There is nothing wront with unspecs.  You cannot use a CCmode value
> without comparing it to 0, though.  The exact kind of comparison
> determines what bits are valid (and have what meaning) in your CC reg,
> even!

The pushfl can be considered as a transparent move, separate bits have
no meaning there. I don't see why using unspec should be any different
than using "naked" register (please also see below, current source
code may update "naked" reg as well). What constitutes use is "(cmp:CC
(CC reg) (const_int 0))" around the register and I think that without
this RTX around the CC reg its use should not be updated in any way.

> > > > The source code that deals with the *user* of the CC register assumes
> > > > the former form, so it blindly tries to update the mode of the CC
> > > > register inside LT comparison RTX
> > >
> > > Would you like it better if there was an assert for this?  There are
> > > very many RTL requirements that aren't chacked for currently :-/
> >
> > In this case - yes. Assert signals that something is unsupported (or
> > invalid), way better than silently corrupting some memory, reporting
> > the corruption only with checking enabled.
>
> Yeah.  The way RTL checking works makes this hard to do in most cases.
> Hrm.  (It cannot easily look at context, only inside of the current RTX).
>
> > > The unspec should have the CC compared with 0 as argument.
> >
> > But this does not do what pushfl does... It pushes the register to the 
> > stack.
>
> Can't you just describe the dataflow then, without an unspec?  An unspec
> by definition does some (unspecified) operation on the data.

Previously, it was defined as:

 (define_insn "*pushfl2"
   [(set (match_operand:W 0 "push_operand" "=<")
 (match_operand:W 1 "flags_reg_operand"))]

But Wmode, AKA SI/DImode is not CCmode. And as said in my last
message, nothing prevents current source code to try to update the CC
register here.

Uros.


Re: [PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]

2024-03-07 Thread Uros Bizjak
On Thu, Mar 7, 2024 at 11:07 PM Uros Bizjak  wrote:
>
> On Thu, Mar 7, 2024 at 10:37 PM Segher Boessenkool
>  wrote:
> >
> > On Thu, Mar 07, 2024 at 10:04:32PM +0100, Uros Bizjak wrote:
> >
> > [snip]
> >
> > > The part we want to fix deals with the *user* of the CC register. It
> > > is not true that this is always COMPARISON_P, so EQ, NE, GE, LT, ...
> > > in the form of
> > >
> > > (LT:CCGC (reg:CCGC 17 flags) (const_int 0))
> > >
> > > but can be something else, such as the above noted
> > >
> > >  (unspec:DI [
> > >  (reg:CC 17 flags)
> > >  ] UNSPEC_PUSHFL)
> >
> > But that is invalid RTL?  The only valid use of a CC is written as
> > cc-compared-to-0.  It means "the result of something compared to 0,
> > stored in that cc reg".
> >
> > (And you can copy a CC reg around, but that is not a use ;-) )

Hm... under this premise, we can also say that any form that is not
cc-compared-to-0 is not a use. Consequently, if it is not a use, then
the CC reg should not be updated at its use location, so my v1 patch,
where we simply skip the update (but retain the combined insn),
actually makes sense.

In this concrete situation, we don't care about CC register mode in
the PUSHFL insn. And we should not change CC reg mode of the use,
because any other mode than the generic CCmode won't be recognized by
the insn pattern.

Uros.


Re: [PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]

2024-03-07 Thread Uros Bizjak
On Thu, Mar 7, 2024 at 10:37 PM Segher Boessenkool
 wrote:
>
> On Thu, Mar 07, 2024 at 10:04:32PM +0100, Uros Bizjak wrote:
>
> [snip]
>
> > The part we want to fix deals with the *user* of the CC register. It
> > is not true that this is always COMPARISON_P, so EQ, NE, GE, LT, ...
> > in the form of
> >
> > (LT:CCGC (reg:CCGC 17 flags) (const_int 0))
> >
> > but can be something else, such as the above noted
> >
> >  (unspec:DI [
> >  (reg:CC 17 flags)
> >  ] UNSPEC_PUSHFL)
>
> But that is invalid RTL?  The only valid use of a CC is written as
> cc-compared-to-0.  It means "the result of something compared to 0,
> stored in that cc reg".
>
> (And you can copy a CC reg around, but that is not a use ;-) )

How can we describe a pushfl then? It was changed to its current form
in [1], but I suspect that the code will try to update it even when
pushfl is implemented as a direct move from a register (as was defined
previously).

BTW: Unspecs are used in a couple of other places for other targets [2].

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112494#c5
[2] https://gcc.gnu.org/pipermail/gcc-patches/2023-December/639743.html

>
> > The source code that deals with the *user* of the CC register assumes
> > the former form, so it blindly tries to update the mode of the CC
> > register inside LT comparison RTX
>
> Would you like it better if there was an assert for this?  There are
> very many RTL requirements that aren't chacked for currently :-/

In this case - yes. Assert signals that something is unsupported (or
invalid), way better than silently corrupting some memory, reporting
the corruption only with checking enabled.

>
> > (some other nearby source code even
> > checks for (const_int 0) RTX). Obviously, this is not the case with
> > the former form, where the update tries to:
> >
> > SUBST (XEXP (*cc_use_loc, 0), ...)
> >
> > on unspec, which has no XEXP (..., 0).
> >
> > And *this* is what triggers RTX checking assert.
>
> The unspec should have the CC compared with 0 as argument.

But this does not do what pushfl does... It pushes the register to the stack.

Uros.


Re: [PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]

2024-03-07 Thread Uros Bizjak
On Thu, Mar 7, 2024 at 10:04 PM Uros Bizjak  wrote:

> The source code that deals with the *user* of the CC register assumes
> the former form, so it blindly tries to update the mode of the CC
> register inside LT comparison RTX (some other nearby source code even
> checks for (const_int 0) RTX). Obviously, this is not the case with
> the former form, where the update tries to:

Please read the above as:

... Obviously, this won't work with the former form, ...

Uros.


Re: [PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]

2024-03-07 Thread Uros Bizjak
On Thu, Mar 7, 2024 at 6:39 PM Segher Boessenkool
 wrote:
>
> On Thu, Mar 07, 2024 at 10:55:12AM +0100, Richard Biener wrote:
> > On Thu, 7 Mar 2024, Uros Bizjak wrote:
> > > This is
> > >
> > > 3236  /* Just replace the CC reg with a new mode.  */
> > > 3237  SUBST (XEXP (*cc_use_loc, 0), newpat_dest);
> > > 3238  undobuf.other_insn = cc_use_insn;
> > >
> > > in combine.cc, where *cc_use_loc is
> > >
> > > (unspec:DI [
> > > (reg:CC 17 flags)
> > > ] UNSPEC_PUSHFL)
> > >
> > > combine assumes CC must be used inside of a comparison and uses XEXP 
> > > (..., 0)
>
> No.  It has established *this is the case* some time earlier.  Lines\
> 3155 and on, what begins with
>   /* Many machines have insns that can both perform an
>  arithmetic operation and set the condition code.
>
> > > OK for trunk?
> >
> > Since you CCed me - looking at the code I wonder why we fatally fail.
>
> I did not get this email btw.  Some blip in email (on the sender's side)
> I guess?
>
> > The following might also fix the issue and preserve more of the
> > rest of the flow of the function.
>
> > --- a/gcc/combine.cc
> > +++ b/gcc/combine.cc
> > @@ -3182,7 +3182,8 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn
> > *i1, rtx_insn *i0,
> >
> >if (undobuf.other_insn == 0
> >   && (cc_use_loc = find_single_use (SET_DEST (newpat), i3,
> > -   _use_insn)))
> > +   _use_insn))
> > + && COMPARISON_P (*cc_use_loc))
>
> Line 3167 already is
>   && GET_CODE (SET_SRC (PATTERN (i3))) == COMPARE
> so what in your backend is unusual?

When combine tries to combine instructions involving COMPARE RTX, e.g.:

(define_insn "*add_2"
  [(set (reg FLAGS_REG)
(compare
  (plus:SWI
(match_operand:SWI 1 "nonimmediate_operand" "%0,0,,rm,r")
(match_operand:SWI 2 "" ",,0,r,"))
  (const_int 0)))
   (set (match_operand:SWI 0 "nonimmediate_operand" "=m,,,r,r")
(plus:SWI (match_dup 1) (match_dup 2)))]

it also updates the *user* of the CC register. The *user* is e.g.:

(define_insn "*setcc_qi"
  [(set (match_operand:QI 0 "nonimmediate_operand" "=qm")
(match_operator:QI 1 "ix86_comparison_operator"
  [(reg FLAGS_REG) (const_int 0)]))]

where "ix86_comparison_operator" is one of EQ, NE, GE, LT ... RTX codes.

The part we want to fix deals with the *user* of the CC register. It
is not true that this is always COMPARISON_P, so EQ, NE, GE, LT, ...
in the form of

(LT:CCGC (reg:CCGC 17 flags) (const_int 0))

but can be something else, such as the above noted

 (unspec:DI [
 (reg:CC 17 flags)
 ] UNSPEC_PUSHFL)

The source code that deals with the *user* of the CC register assumes
the former form, so it blindly tries to update the mode of the CC
register inside LT comparison RTX (some other nearby source code even
checks for (const_int 0) RTX). Obviously, this is not the case with
the former form, where the update tries to:

SUBST (XEXP (*cc_use_loc, 0), ...)

on unspec, which has no XEXP (..., 0).

And *this* is what triggers RTX checking assert.

Uros.


Re: [PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]

2024-03-07 Thread Uros Bizjak
On Thu, Mar 7, 2024 at 12:11 PM Richard Biener  wrote:
>
> On Thu, 7 Mar 2024, Jakub Jelinek wrote:
>
> > On Thu, Mar 07, 2024 at 11:11:35AM +0100, Uros Bizjak wrote:
> > > > Since you CCed me - looking at the code I wonder why we fatally fail.
> > > > The following might also fix the issue and preserve more of the
> > > > rest of the flow of the function.
> > > >
> > > > If that works I'd prefer it.  But I'll defer approval to the combine
> > > > maintainer which is Segher.
> > >
> > > Your patch is basically what v1 did [1], but it was suggested (in a
> > > reply by you ;) ) that we should stop the attempt to combine if we
> > > can't handle the use. So, the v2 patch undoes the combine and records
> > > a nice message in this case.
> >
> > My understanding of Richi's patch is that it it treats the non-COMPARISON_P
> > the same as if find_single_use fails, which is a common case that certainly
> > has to be handled right and it doesn't seem that we are giving up completely
> > for that case.  So, I think it is reasonable to treat the non-COMPARISON_P
> > *cc_use_loc as NULL cc_use_loc.
>
> The question is, whether a NULL cc_use_loc (find_single_use returning
> NULL) means "there is no use" or it can mean "huh, don't know, maybe
> more than one, maybe I was too stupid to indentify the single use".
> The implementation suggests it's all broken ;)

As I understood find_single_use, it is returning RTX iff DEST is used
only a single time in an insn sequence following INSN.
find_single_use_1 returns RTX iff argument is used exactly once in
DEST. So, find_single_use returns RTX only when DEST is used exactly
once in a sequence following INSN.

We can reject the combination without worries of multiple uses.

Uros,


Re: [PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]

2024-03-07 Thread Uros Bizjak
On Thu, Mar 7, 2024 at 11:37 AM Jakub Jelinek  wrote:
>
> On Thu, Mar 07, 2024 at 11:11:35AM +0100, Uros Bizjak wrote:
> > > Since you CCed me - looking at the code I wonder why we fatally fail.
> > > The following might also fix the issue and preserve more of the
> > > rest of the flow of the function.
> > >
> > > If that works I'd prefer it.  But I'll defer approval to the combine
> > > maintainer which is Segher.
> >
> > Your patch is basically what v1 did [1], but it was suggested (in a
> > reply by you ;) ) that we should stop the attempt to combine if we
> > can't handle the use. So, the v2 patch undoes the combine and records
> > a nice message in this case.
>
> My understanding of Richi's patch is that it it treats the non-COMPARISON_P
> the same as if find_single_use fails, which is a common case that certainly
> has to be handled right and it doesn't seem that we are giving up completely
> for that case.  So, I think it is reasonable to treat the non-COMPARISON_P
> *cc_use_loc as NULL cc_use_loc.

Please see the logic in my v1 patch. For COMPARISON_P (*cc_use_loc),
we execute the same code in the first hunk of the patch, but for
non-COMPARISON_P, my patch zeroes cc_use_loc. The cc_use_loc is used
only in the "if (cc_use_loc)" protected part, so clearing cc_use_loc
when !COMPARISON_P (*cc_use_loc) has exactly the same effect as adding
COMPARISON_P check to existing "if (cc_use_loc) - we can execute the
"if" part only when *cc_use_loc is a comparison.

The functionality of Richi's patch is exactly the same as my v1 patch
which was rejected for the reason mentioned in my previous post.

Uros.


Re: [PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]

2024-03-07 Thread Uros Bizjak
On Thu, Mar 7, 2024 at 10:56 AM Richard Biener  wrote:
>
> On Thu, 7 Mar 2024, Uros Bizjak wrote:
>
> > The compiler, configured with --enable-checking=yes,rtl,extra ICEs with:
> >
> > internal compiler error: RTL check: expected elt 0 type 'e' or 'u',
> > have 'E' (rtx unspec) in try_combine, at combine.cc:3237
> >
> > This is
> >
> > 3236  /* Just replace the CC reg with a new mode.  */
> > 3237  SUBST (XEXP (*cc_use_loc, 0), newpat_dest);
> > 3238  undobuf.other_insn = cc_use_insn;
> >
> > in combine.cc, where *cc_use_loc is
> >
> > (unspec:DI [
> > (reg:CC 17 flags)
> > ] UNSPEC_PUSHFL)
> >
> > combine assumes CC must be used inside of a comparison and uses XEXP (..., 
> > 0)
> > without checking on the RTX type of the argument.
> >
> > Undo the combination if *cc_use_loc is not COMPARISON_P.
> >
> > Also remove buggy and now redundant check for (const 0) RTX as part of
> > the comparison.
> >
> > PR rtl-optimization/112560
> >
> > gcc/ChangeLog:
> >
> > * combine.cc (try_combine): Reject the combination
> > if *cc_use_loc is not COMPARISON_P.
> >
> > Bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}.
> >
> > OK for trunk?
>
> Since you CCed me - looking at the code I wonder why we fatally fail.
> The following might also fix the issue and preserve more of the
> rest of the flow of the function.
>
> If that works I'd prefer it.  But I'll defer approval to the combine
> maintainer which is Segher.

Your patch is basically what v1 did [1], but it was suggested (in a
reply by you ;) ) that we should stop the attempt to combine if we
can't handle the use. So, the v2 patch undoes the combine and records
a nice message in this case.

Also, please note the removal of an existing crude hack that tries to
reject non-comparison uses by looking for (const_int 0) in the use
RTX, which is wrong as well.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-November/638589.html

Uros.

>
> Thanks,
> Richard.
>
> diff --git a/gcc/combine.cc b/gcc/combine.cc
> index a4479f8d836..e280cd72ec7 100644
> --- a/gcc/combine.cc
> +++ b/gcc/combine.cc
> @@ -3182,7 +3182,8 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn
> *i1, rtx_insn *i0,
>
>if (undobuf.other_insn == 0
>   && (cc_use_loc = find_single_use (SET_DEST (newpat), i3,
> -   _use_insn)))
> +   _use_insn))
> + && COMPARISON_P (*cc_use_loc))
> {
>   compare_code = orig_compare_code = GET_CODE (*cc_use_loc);
>   if (is_a  (GET_MODE (i2dest), ))
> @@ -3200,7 +3201,7 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn
> *i1, rtx_insn *i0,
>  the above simplify_compare_const() returned a new comparison
>  operator.  undobuf.other_insn is assigned the CC use insn
>  when modifying it.  */
> - if (cc_use_loc)
> + if (cc_use_loc && COMPARISON_P (*cc_use_loc))
> {
>  #ifdef SELECT_CC_MODE
>   machine_mode new_mode
>


[PATCH v2] combine: Fix ICE in try_combine on pr112494.c [PR112560]

2024-03-07 Thread Uros Bizjak
The compiler, configured with --enable-checking=yes,rtl,extra ICEs with:

internal compiler error: RTL check: expected elt 0 type 'e' or 'u',
have 'E' (rtx unspec) in try_combine, at combine.cc:3237

This is

3236  /* Just replace the CC reg with a new mode.  */
3237  SUBST (XEXP (*cc_use_loc, 0), newpat_dest);
3238  undobuf.other_insn = cc_use_insn;

in combine.cc, where *cc_use_loc is

(unspec:DI [
(reg:CC 17 flags)
] UNSPEC_PUSHFL)

combine assumes CC must be used inside of a comparison and uses XEXP (..., 0)
without checking on the RTX type of the argument.

Undo the combination if *cc_use_loc is not COMPARISON_P.

Also remove buggy and now redundant check for (const 0) RTX as part of
the comparison.

PR rtl-optimization/112560

gcc/ChangeLog:

* combine.cc (try_combine): Reject the combination
if *cc_use_loc is not COMPARISON_P.

Bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}.

OK for trunk?

Uros.
diff --git a/gcc/combine.cc b/gcc/combine.cc
index a4479f8d836..6dac9ffca85 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -3184,11 +3184,21 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
rtx_insn *i0,
  && (cc_use_loc = find_single_use (SET_DEST (newpat), i3,
_use_insn)))
{
- compare_code = orig_compare_code = GET_CODE (*cc_use_loc);
- if (is_a  (GET_MODE (i2dest), ))
-   compare_code = simplify_compare_const (compare_code, mode,
-  , );
- target_canonicalize_comparison (_code, , , 1);
+ if (COMPARISON_P (*cc_use_loc))
+   {
+ compare_code = orig_compare_code = GET_CODE (*cc_use_loc);
+ if (is_a  (GET_MODE (i2dest), ))
+   compare_code = simplify_compare_const (compare_code, mode,
+  , );
+ target_canonicalize_comparison (_code, , , 1);
+   }
+ else
+   {
+ if (dump_file && (dump_flags & TDF_DETAILS))
+   fprintf (dump_file, "CC register not used in comparison.\n");
+ undo_all ();
+ return 0;
+   }
}
 
   /* Do the rest only if op1 is const0_rtx, which may be the
@@ -3221,9 +3231,7 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
rtx_insn *i0,
}
 #endif
  /* Cases for modifying the CC-using comparison.  */
- if (compare_code != orig_compare_code
- /* ??? Do we need to verify the zero rtx?  */
- && XEXP (*cc_use_loc, 1) == const0_rtx)
+ if (compare_code != orig_compare_code)
{
  /* Replace cc_use_loc with entire new RTX.  */
  SUBST (*cc_use_loc,


[committed] i386: Fix and improve insn constraint for V2QI arithmetic/shift insns

2024-03-06 Thread Uros Bizjak
optimize_function_for_size_p predicate is not stable during optab selection,
because it also depends on node->count/node->frequency of the current function,
which are updated during IPA, so they may change between early opts and
late opts.  Use optimize_size instead - optimize_size implies
optimize_function_for_size_p (cfun), so if a named pattern uses
"&& optimize_size" and the insn it splits into uses
optimize_function_for_size_p (cfun), it shouldn't fail.

PR target/114232

gcc/ChangeLog:

* config/i386/mmx.md (negv2qi2): Enable for optimize_size instead
of optimize_function_for_size_p.  Explicitly enable for TARGET_SSE2.
(negv2qi SSE reg splitter): Enable for TARGET_SSE2 only.
(v2qi3): Enable for optimize_size instead
of optimize_function_for_size_p.  Explicitly enable for TARGET_SSE2.
(v2qi SSE reg splitter): Enable for TARGET_SSE2 only.
(v2qi3): Enable for optimize_size instead
of optimize_function_for_size_p.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 2856ae6ffef..9a8d6030d8b 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -2874,11 +2874,18 @@ (define_insn "negv2qi2"
 (neg:V2QI
  (match_operand:V2QI 1 "register_operand" "0,Yw")))
(clobber (reg:CC FLAGS_REG))]
-  "!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun)"
+  "!TARGET_PARTIAL_REG_STALL || optimize_size || TARGET_SSE2"
   "#"
   [(set_attr "isa" "*,sse2")
(set_attr "type" "multi")
-   (set_attr "mode" "QI,TI")])
+   (set_attr "mode" "QI,TI")
+   (set (attr "enabled")
+   (cond [(and (eq_attr "alternative" "0")
+   (and (match_test "TARGET_PARTIAL_REG_STALL")
+(not (match_test "optimize_function_for_size_p 
(cfun)"
+   (symbol_ref "false")
+ ]
+ (const_string "*")))])
 
 (define_split
   [(set (match_operand:V2QI 0 "general_reg_operand")
@@ -2912,8 +2919,7 @@ (define_split
 (neg:V2QI
  (match_operand:V2QI 1 "sse_reg_operand")))
(clobber (reg:CC FLAGS_REG))]
-  "(!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun))
-   && TARGET_SSE2 && reload_completed"
+  "TARGET_SSE2 && reload_completed"
   [(set (match_dup 0) (match_dup 2))
(set (match_dup 0)
(minus:V16QI (match_dup 0) (match_dup 1)))]
@@ -2975,11 +2981,18 @@ (define_insn "v2qi3"
  (match_operand:V2QI 1 "register_operand" "0,0,Yw")
  (match_operand:V2QI 2 "register_operand" "Q,x,Yw")))
(clobber (reg:CC FLAGS_REG))]
-  "!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun)"
+  "!TARGET_PARTIAL_REG_STALL || optimize_size || TARGET_SSE2"
   "#"
   [(set_attr "isa" "*,sse2_noavx,avx")
(set_attr "type" "multi,sseadd,sseadd")
-   (set_attr "mode" "QI,TI,TI")])
+   (set_attr "mode" "QI,TI,TI")
+   (set (attr "enabled")
+   (cond [(and (eq_attr "alternative" "0")
+   (and (match_test "TARGET_PARTIAL_REG_STALL")
+(not (match_test "optimize_function_for_size_p 
(cfun)"
+   (symbol_ref "false")
+ ]
+ (const_string "*")))])
 
 (define_split
   [(set (match_operand:V2QI 0 "general_reg_operand")
@@ -3021,8 +3034,7 @@ (define_split
  (match_operand:V2QI 1 "sse_reg_operand")
  (match_operand:V2QI 2 "sse_reg_operand")))
(clobber (reg:CC FLAGS_REG))]
-  "(!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun))
-   && TARGET_SSE2 && reload_completed"
+  "TARGET_SSE2 && reload_completed"
   [(set (match_dup 0)
 (plusminus:V16QI (match_dup 1) (match_dup 2)))]
 {
@@ -3684,9 +3696,10 @@ (define_insn_and_split "v2qi3"
  (match_operand:V2QI 1 "register_operand" "0")
  (match_operand:QI 2 "nonmemory_operand" "cI")))
(clobber (reg:CC FLAGS_REG))]
-  "!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun)"
+  "!TARGET_PARTIAL_REG_STALL || optimize_size"
   "#"
-  "&& reload_completed"
+  "(!TARGET_PARTIAL_REG_STALL || optimize_function_for_size_p (cfun))
+   && reload_completed"
   [(parallel
  [(set (zero_extract:HI (match_dup 3) (const_int 8) (const_int 8))
   (subreg:HI


[committed] i386: Eliminate common code from x86_32 TARGET_MACHO part in ix86_expand_move

2024-03-06 Thread Uros Bizjak
Eliminate common code from x86_32 TARGET_MACHO part in ix86_expand_move and
use generic code instead.

No functional changes.

gcc/ChangeLog:

* config/i386/i386-expand.cc (ix86_expand_move) [TARGET_MACHO]:
Eliminate common code and use generic code instead.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32} and by
Iain on i686-darwin9, i686 and x86_64-darwin17, x86_64-darwin19, 21,
23.

Uros.
diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 3b1685ae448..2210e6f7cc8 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -471,9 +471,9 @@ ix86_expand_move (machine_mode mode, rtx operands[])
   if ((flag_pic || MACHOPIC_INDIRECT)
   && symbolic_operand (op1, mode))
 {
+#if TARGET_MACHO
   if (TARGET_MACHO && !TARGET_64BIT)
{
-#if TARGET_MACHO
  /* dynamic-no-pic */
  if (MACHOPIC_INDIRECT)
{
@@ -490,33 +490,18 @@ ix86_expand_move (machine_mode mode, rtx operands[])
  emit_insn (insn);
  return;
}
- if (GET_CODE (op0) == MEM)
-   op1 = force_reg (Pmode, op1);
- else
-   {
- rtx temp = op0;
- if (GET_CODE (temp) != REG)
-   temp = gen_reg_rtx (Pmode);
- temp = legitimize_pic_address (op1, temp);
- if (temp == op0)
-   return;
- op1 = temp;
-   }
-  /* dynamic-no-pic */
-#endif
}
-  else
+#endif
+
+  if (MEM_P (op0))
+   op1 = force_reg (mode, op1);
+  else if (!(TARGET_64BIT && x86_64_movabs_operand (op1, DImode)))
{
- if (MEM_P (op0))
-   op1 = force_reg (mode, op1);
- else if (!(TARGET_64BIT && x86_64_movabs_operand (op1, DImode)))
-   {
- rtx reg = can_create_pseudo_p () ? NULL_RTX : op0;
- op1 = legitimize_pic_address (op1, reg);
- if (op0 == op1)
-   return;
- op1 = convert_to_mode (mode, op1, 1);
-   }
+ rtx reg = can_create_pseudo_p () ? NULL_RTX : op0;
+ op1 = legitimize_pic_address (op1, reg);
+ if (op0 == op1)
+   return;
+ op1 = convert_to_mode (mode, op1, 1);
}
 }
   else


Re: [PATCH] i386: Fix up the vzeroupper REG_DEAD/REG_UNUSED note workaround [PR114190]

2024-03-06 Thread Uros Bizjak
On Wed, Mar 6, 2024 at 9:10 AM Jakub Jelinek  wrote:
>
> Hi!
>
> When writing the rest_of_handle_insert_vzeroupper workaround to manually
> remove all the REG_DEAD/REG_UNUSED notes from the IL, I've missed that
> there is a df_analyze () call right after it and that the problems added
> earlier in the pass, like df_note_add_problem () done during mode switching,
> doesn't affect just the next df_analyze () call right after it, but all
> other df_analyze () calls until the end of the current pass where
> df_finish_pass removes the optional problems.
>
> So, as can be seen on the following patch, the workaround doesn't actually
> work there, because while rest_of_handle_insert_vzeroupper carefully removes
> all REG_DEAD/REG_UNUSED notes, the df_analyze () call at the end of the
> function immediately adds them in again (so, I must say I have no idea
> why the workaround worked on the earlier testcases).
>
> Now, I could move the df_analyze () call just before the REG_DEAD/REG_UNUSED
> note removal loop, but I think the following patch is better, because
> the df_analyze () call doesn't have to recompute the problem when we don't
> care about it and will actively strip all traces of it away.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2024-03-06  Jakub Jelinek  
>
> PR rtl-optimization/114190
> * config/i386/i386-features.cc (rest_of_handle_insert_vzeroupper):
> Call df_remove_problem for df_note before calling df_analyze.
>
> * gcc.target/i386/avx-pr114190.c: New test.

OK.

Thanks,
Uros.

>
> --- gcc/config/i386/i386-features.cc.jj 2024-02-22 10:10:18.658032517 +0100
> +++ gcc/config/i386/i386-features.cc2024-03-05 09:23:54.496112264 +0100
> @@ -2690,6 +2690,7 @@ rest_of_handle_insert_vzeroupper (void)
> }
> }
>
> +  df_remove_problem (df_note);
>df_analyze ();
>return 0;
>  }
> --- gcc/testsuite/gcc.target/i386/avx-pr114190.c.jj 2024-03-05 
> 10:07:24.869454305 +0100
> +++ gcc/testsuite/gcc.target/i386/avx-pr114190.c2024-03-05 
> 10:06:52.870889687 +0100
> @@ -0,0 +1,27 @@
> +/* PR rtl-optimization/114190 */
> +/* { dg-do run { target avx } } */
> +/* { dg-options "-O2 -fno-dce -fharden-compares -mavx 
> --param=max-rtl-if-conversion-unpredictable-cost=136 -mno-avx512f -Wno-psabi" 
> } */
> +
> +#include "avx-check.h"
> +
> +typedef unsigned char U __attribute__((vector_size (64)));
> +typedef unsigned int V __attribute__((vector_size (64)));
> +U u;
> +
> +V
> +foo (V a, V b)
> +{
> +  u[0] = __builtin_sub_overflow (0, (int) a[0], [b[7] & 5]) ? -u[1] : 
> -b[3];
> +  b ^= 0 != b;
> +  return (V) u + (V) a + (V) b;
> +}
> +
> +static void
> +avx_test (void)
> +{
> +  V x = foo ((V) { 1 }, (V) { 0, 0, 0, 1 });
> +  if (x[0] != -1U)
> +__builtin_abort ();
> +  if (x[3] != -2U)
> +__builtin_abort ();
> +}
>
> Jakub
>


Re: [PATCH] i386: Fix ICEs with SUBREGs from vector etc. constants to XFmode [PR114184]

2024-03-04 Thread Uros Bizjak
On Mon, Mar 4, 2024 at 9:41 AM Jakub Jelinek  wrote:
>
> On Mon, Mar 04, 2024 at 09:34:30AM +0100, Uros Bizjak wrote:
> > > --- gcc/config/i386/i386-expand.cc.jj   2024-03-01 14:56:34.120925989 
> > > +0100
> > > +++ gcc/config/i386/i386-expand.cc  2024-03-03 18:41:08.278793046 
> > > +0100
> > > @@ -451,6 +451,20 @@ ix86_expand_move (machine_mode mode, rtx
> > >   && GET_MODE (SUBREG_REG (op1)) == DImode
> > >   && SUBREG_BYTE (op1) == 0)
> > > op1 = gen_rtx_ZERO_EXTEND (TImode, SUBREG_REG (op1));
> > > +  /* As not all values in XFmode are representable in real_value,
> > > +we might be called with unfoldable SUBREGs of constants.  */
> > > +  if (mode == XFmode
> > > + && CONSTANT_P (SUBREG_REG (op1))
> > > + && can_create_pseudo_p ())
> >
> > We have quite some unguarded force_regs in ix86_expand_move. While it
> > doesn't hurt to have an extra safety net, is there a particular reason
> > for can_create_pseudo_p check in the added code?
>
> Various other places in ix86_expand_move do check can_create_pseudo_p, the
> case I've mostly copied this from in ix86_expand_vector_move also does that,
> and then there is the
>  Therefore, when given such a pair of operands, the pattern must
>  generate RTL which needs no reloading and needs no temporary
>  registers--no registers other than the operands.  For example, if
>  you support the pattern with a 'define_expand', then in such a case
>  the 'define_expand' mustn't call 'force_reg' or any other such
>  function which might generate new pseudo registers.
> in mov description, which initially scared me off from using it at all.
> Guess we'll ICE either way if something like that appears during RA.

Thanks for the insight - it was PIC handling in ix86_expand_move that
catched my eye, especially the TARGET_MACHO part that looks like it
was somehow left behind. OTOH, the whole ix86_expand_move would need
some TLC anyway.

FAOD - the patch is OK as is.

Thanks,
Uros.


Re: [PATCH] i386: Fix ICEs with SUBREGs from vector etc. constants to XFmode [PR114184]

2024-03-04 Thread Uros Bizjak
On Mon, Mar 4, 2024 at 9:25 AM Jakub Jelinek  wrote:
>
> Hi!
>
> The Intel extended format has the various weird number categories,
> pseudo denormals, pseudo infinities, pseudo NaNs and unnormals.
> Those are not representable in the GCC real_value and so neither
> GIMPLE nor RTX VIEW_CONVERT_EXPR/SUBREG folding folds those into
> constants.
>
> As can be seen on the following testcase, because it isn't folded
> (since GCC 12, before that we were folding it) we can end up with
> a SUBREG of a CONST_VECTOR or similar constant, which isn't valid
> general_operand, so we ICE during vregs pass trying to recognize
> the move instruction.
> Initially I thought it is a middle-end bug, the movxf instruction
> has general_operand predicate, but the middle-end certainly never
> tests that predicate, seems moves are special optabs.
> And looking at other mov optabs, e.g. for vector modes the i386
> patterns use nonimmediate_operand predicate on the input, yet
> ix86_expand_vector_move deals with CONSTANT_P and SUBREG of CONSTANT_P
> arguments which if the predicate was checked couldn't ever make it through.
>
> The following patch handles this case similarly to the
> ix86_expand_vector_move's SUBREG of CONSTANT_P case, does it just for XFmode
> because I believe that is the only mode that needs it from the scalar ones,
> others should just be folded.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2024-03-04  Jakub Jelinek  
>
> PR target/114184
> * config/i386/i386-expand.cc (ix86_expand_move): If XFmode op1
> is SUBREG of CONSTANT_P, force the SUBREG_REG into memory or
> register.
>
> * gcc.target/i386/pr114184.c: New test.

OK, with a question inline.

Thanks,
Uros.

>
> --- gcc/config/i386/i386-expand.cc.jj   2024-03-01 14:56:34.120925989 +0100
> +++ gcc/config/i386/i386-expand.cc  2024-03-03 18:41:08.278793046 +0100
> @@ -451,6 +451,20 @@ ix86_expand_move (machine_mode mode, rtx
>   && GET_MODE (SUBREG_REG (op1)) == DImode
>   && SUBREG_BYTE (op1) == 0)
> op1 = gen_rtx_ZERO_EXTEND (TImode, SUBREG_REG (op1));
> +  /* As not all values in XFmode are representable in real_value,
> +we might be called with unfoldable SUBREGs of constants.  */
> +  if (mode == XFmode
> + && CONSTANT_P (SUBREG_REG (op1))
> + && can_create_pseudo_p ())

We have quite some unguarded force_regs in ix86_expand_move. While it
doesn't hurt to have an extra safety net, is there a particular reason
for can_create_pseudo_p check in the added code?

> +   {
> + machine_mode imode = GET_MODE (SUBREG_REG (op1));
> + rtx r = force_const_mem (imode, SUBREG_REG (op1));
> + if (r)
> +   r = validize_mem (r);
> + else
> +   r = force_reg (imode, SUBREG_REG (op1));
> + op1 = simplify_gen_subreg (mode, r, imode, SUBREG_BYTE (op1));
> +   }
>break;
>  }
>
> --- gcc/testsuite/gcc.target/i386/pr114184.c.jj 2024-03-03 18:45:45.912964030 
> +0100
> +++ gcc/testsuite/gcc.target/i386/pr114184.c2024-03-03 18:45:37.639078138 
> +0100
> @@ -0,0 +1,22 @@
> +/* PR target/114184 */
> +/* { dg-do compile } */
> +/* { dg-options "-Og -mavx2" } */
> +
> +typedef unsigned char V __attribute__((vector_size (32)));
> +typedef unsigned char W __attribute__((vector_size (16)));
> +
> +_Complex long double
> +foo (void)
> +{
> +  _Complex long double d;
> +  *(V *) = (V) { 149, 136, 89, 42, 38, 240, 196, 194 };
> +  return d;
> +}
> +
> +long double
> +bar (void)
> +{
> +  long double d;
> +  *(W *) = (W) { 149, 136, 89, 42, 38, 240, 196, 194 };
> +  return d;
> +}
>
> Jakub
>


[committed] alpha: Introduce UMUL_HIGHPART rtx_code [PR113720]

2024-03-03 Thread Uros Bizjak
umuldi3_highpart expander does:

   if (REG_P (operands[2]))
 operands[2] = gen_rtx_ZERO_EXTEND (TImode, operands[2]);

on register_operand predicate, which also allows SUBREG RTX. So,
subregs were emitted without ZERO_EXTEND RTX.

But nowadays we have UMUL_HIGHPART that allows us to fix this
issue while also simplifying the instruction RTX.

PR target/113720

gcc/ChangeLog:

* config/alpha/alpha.md (umuldi3_highpart): Remove expander.
(*umuldi3_highpart_reg): Rename to umuldi3_highpart and
simplify insn RTX using UMUL_HIGHPART rtx_code.
(*umuldi3_highpart_const): Remove.

Tested by building a cross-compiler to alpha-linux-gnu.

Uros.
diff --git a/gcc/config/alpha/alpha.md b/gcc/config/alpha/alpha.md
index 94d5d339c3d..79f12c53c16 100644
--- a/gcc/config/alpha/alpha.md
+++ b/gcc/config/alpha/alpha.md
@@ -683,41 +683,10 @@ (define_insn "mulv3"
   [(set_attr "type" "imul")
(set_attr "opsize" "")])
 
-(define_expand "umuldi3_highpart"
-  [(set (match_operand:DI 0 "register_operand")
-   (truncate:DI
-(lshiftrt:TI
- (mult:TI (zero_extend:TI
-(match_operand:DI 1 "register_operand"))
-  (match_operand:DI 2 "reg_or_8bit_operand"))
- (const_int 64]
-  ""
-{
-  if (REG_P (operands[2]))
-operands[2] = gen_rtx_ZERO_EXTEND (TImode, operands[2]);
-})
-
-(define_insn "*umuldi3_highpart_reg"
-  [(set (match_operand:DI 0 "register_operand" "=r")
-   (truncate:DI
-(lshiftrt:TI
- (mult:TI (zero_extend:TI
-(match_operand:DI 1 "register_operand" "r"))
-  (zero_extend:TI
-(match_operand:DI 2 "register_operand" "r")))
- (const_int 64]
-  ""
-  "umulh %1,%2,%0"
-  [(set_attr "type" "imul")
-   (set_attr "opsize" "udi")])
-
-(define_insn "*umuldi3_highpart_const"
+(define_insn "umuldi3_highpart"
   [(set (match_operand:DI 0 "register_operand" "=r")
-   (truncate:DI
-(lshiftrt:TI
- (mult:TI (zero_extend:TI (match_operand:DI 1 "register_operand" "r"))
-  (match_operand:TI 2 "cint8_operand" "I"))
- (const_int 64]
+   (umul_highpart:DI (match_operand:DI 1 "reg_or_0_operand" "%rJ")
+ (match_operand:DI 2 "reg_or_8bit_operand" "rI")))]
   ""
   "umulh %1,%2,%0"
   [(set_attr "type" "imul")


[committed] i386: psrlq is not used for PERM [PR113871]

2024-02-27 Thread Uros Bizjak
Also handle V2BF mode.

PR target/113871

gcc/ChangeLog:

* config/i386/mmx.md (V248FI): Add V2BF mode.
(V24FI_32): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr113871-5a.c: New test.
* gcc.target/i386/pr113871-5b.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 075309cca9f..2856ae6ffef 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -85,9 +85,9 @@ (define_mode_iterator V2FI [V2SF V2SI])
 
 (define_mode_iterator V24FI [V2SF V2SI V4HF V4HI])
 
-(define_mode_iterator V248FI [V2SF V2SI V4HF V4HI V8QI])
+(define_mode_iterator V248FI [V2SF V2SI V4HF V4BF V4HI V8QI])
 
-(define_mode_iterator V24FI_32 [V2HF V2HI V4QI])
+(define_mode_iterator V24FI_32 [V2HF V2BF V2HI V4QI])
 
 ;; Mapping from integer vector mode to mnemonic suffix
 (define_mode_attr mmxvecsize
diff --git a/gcc/testsuite/gcc.target/i386/pr113871-5a.c 
b/gcc/testsuite/gcc.target/i386/pr113871-5a.c
new file mode 100644
index 000..25ab82a6eab
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr113871-5a.c
@@ -0,0 +1,19 @@
+/* PR target/113871 */
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2" } */
+
+typedef __bf16 vect64 __attribute__((vector_size(8)));
+
+void f (vect64 *a)
+{
+  *a = __builtin_shufflevector(*a, (vect64){0}, 1, 2, 3, 4);
+}
+
+/* { dg-final { scan-assembler "psrlq" } } */
+
+void g(vect64 *a)
+{
+  *a = __builtin_shufflevector((vect64){0}, *a, 3, 4, 5, 6);
+}
+
+/* { dg-final { scan-assembler "psllq" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr113871-5b.c 
b/gcc/testsuite/gcc.target/i386/pr113871-5b.c
new file mode 100644
index 000..363a0f516cd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr113871-5b.c
@@ -0,0 +1,19 @@
+/* PR target/113871 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse2" } */
+
+typedef __bf16 vect32 __attribute__((vector_size(4)));
+
+void f (vect32 *a)
+{
+  *a = __builtin_shufflevector(*a, (vect32){0}, 1, 2);
+}
+
+/* { dg-final { scan-assembler "psrld" } } */
+
+void g(vect32 *a)
+{
+  *a = __builtin_shufflevector((vect32){0}, *a, 1, 2);
+}
+
+/* { dg-final { scan-assembler "pslld" } } */


Re: Patch ping^2

2024-02-26 Thread Uros Bizjak
On Mon, Feb 26, 2024 at 10:33 AM Jakub Jelinek  wrote:
>
> Hi!
>
> I'd like to ping 2 patches:

> https://gcc.gnu.org/pipermail/gcc-patches/2024-February/645326.html
> i386: Enable _BitInt support on ia32
>
> all the FAILs mentioned in that mail have been fixed by now.

LGTM, based on HJ's advice.

Uros.


Re: [PATCH v2] x86: Check interrupt instead of noreturn attribute

2024-02-26 Thread Uros Bizjak
On Sun, Feb 25, 2024 at 10:14 PM H.J. Lu  wrote:
>
> ix86_set_func_type checks noreturn attribute to avoid incompatible
> attribute error in LTO1 on interrupt functions.  Since TREE_THIS_VOLATILE
> is set also for _Noreturn without noreturn attribute, check interrupt
> attribute for interrupt functions instead.
>
> gcc/
>
> PR target/114097
> * config/i386/i386-options.cc (ix86_set_func_type): Check
> interrupt instead of noreturn attribute.
>
> gcc/testsuite/
>
> PR target/114097
> * gcc.target/i386/pr114097-1.c: New test.

LGTM.

Thanks,
Uros.

> ---
>  gcc/config/i386/i386-options.cc|  8 ---
>  gcc/testsuite/gcc.target/i386/pr114097-1.c | 26 ++
>  2 files changed, 31 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr114097-1.c
>
> diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
> index 93a01146db7..1301f6b913e 100644
> --- a/gcc/config/i386/i386-options.cc
> +++ b/gcc/config/i386/i386-options.cc
> @@ -3391,11 +3391,13 @@ ix86_set_func_type (tree fndecl)
>   into a noreturn function by setting TREE_THIS_VOLATILE.  Normally
>   the local-pure-const pass is run after ix86_set_func_type is called.
>   When the local-pure-const pass is enabled for LTO, the interrupt
> - function is marked as noreturn in the IR output, which leads the
> - incompatible attribute error in LTO1.  */
> + function is marked with TREE_THIS_VOLATILE in the IR output, which
> + leads to the incompatible attribute error in LTO1.  Ignore the
> + interrupt function in this case.  */
>bool has_no_callee_saved_registers
>  = ((TREE_THIS_VOLATILE (fndecl)
> -   && lookup_attribute ("noreturn", DECL_ATTRIBUTES (fndecl))
> +   && !lookup_attribute ("interrupt",
> + TYPE_ATTRIBUTES (TREE_TYPE (fndecl)))
> && optimize
> && !optimize_debug
> && (TREE_NOTHROW (fndecl) || !flag_exceptions))
> diff --git a/gcc/testsuite/gcc.target/i386/pr114097-1.c 
> b/gcc/testsuite/gcc.target/i386/pr114097-1.c
> new file mode 100644
> index 000..b14c7b6214d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr114097-1.c
> @@ -0,0 +1,26 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mtune-ctrl=^prologue_using_move,^epilogue_using_move 
> -fomit-frame-pointer" } */
> +
> +#define ARRAY_SIZE 256
> +
> +extern int array[ARRAY_SIZE][ARRAY_SIZE][ARRAY_SIZE];
> +extern int value (int, int, int)
> +#ifndef __x86_64__
> +__attribute__ ((regparm(3)))
> +#endif
> +;
> +
> +void
> +_Noreturn
> +no_return_to_caller (void)
> +{
> +  unsigned i, j, k;
> +  for (i = ARRAY_SIZE; i > 0; --i)
> +for (j = ARRAY_SIZE; j > 0; --j)
> +  for (k = ARRAY_SIZE; k > 0; --k)
> +   array[i - 1][j - 1][k - 1] = value (i, j, k);
> +  while (1);
> +}
> +
> +/* { dg-final { scan-assembler-not "push" } } */
> +/* { dg-final { scan-assembler-not "pop" } } */
> --
> 2.43.2
>


Re: [PATCH] x86: Check interrupt instead of noreturn attribute

2024-02-25 Thread Uros Bizjak
On Sun, Feb 25, 2024 at 5:01 PM H.J. Lu  wrote:
>
> ix86_set_func_type checks noreturn attribute to avoid incompatible
> attribute error in LTO1 on interrupt functions.  Since TREE_THIS_VOLATILE
> is set also for _Noreturn without noreturn attribute, check interrupt
> attribute for interrupt functions instead.

Please also adjust the comment above the change. The current comment
even explains why the "noreturn" attribute is checked instead of
"interrupt" attribute.

Uros.

>
> gcc/
>
> PR target/114097
> * config/i386/i386-options.cc (ix86_set_func_type): Check
> interrupt instead of noreturn attribute.
>
> gcc/testsuite/
>
> PR target/114097
> * gcc.target/i386/pr114097-1.c: New test.
> ---
>  gcc/config/i386/i386-options.cc|  3 ++-
>  gcc/testsuite/gcc.target/i386/pr114097-1.c | 26 ++
>  2 files changed, 28 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr114097-1.c
>
> diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
> index 93a01146db7..82fe0d228cd 100644
> --- a/gcc/config/i386/i386-options.cc
> +++ b/gcc/config/i386/i386-options.cc
> @@ -3395,7 +3395,8 @@ ix86_set_func_type (tree fndecl)
>   incompatible attribute error in LTO1.  */
>bool has_no_callee_saved_registers
>  = ((TREE_THIS_VOLATILE (fndecl)
> -   && lookup_attribute ("noreturn", DECL_ATTRIBUTES (fndecl))
> +   && !lookup_attribute ("interrupt",
> + TYPE_ATTRIBUTES (TREE_TYPE (fndecl)))
> && optimize
> && !optimize_debug
> && (TREE_NOTHROW (fndecl) || !flag_exceptions))
> diff --git a/gcc/testsuite/gcc.target/i386/pr114097-1.c 
> b/gcc/testsuite/gcc.target/i386/pr114097-1.c
> new file mode 100644
> index 000..b14c7b6214d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr114097-1.c
> @@ -0,0 +1,26 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mtune-ctrl=^prologue_using_move,^epilogue_using_move 
> -fomit-frame-pointer" } */
> +
> +#define ARRAY_SIZE 256
> +
> +extern int array[ARRAY_SIZE][ARRAY_SIZE][ARRAY_SIZE];
> +extern int value (int, int, int)
> +#ifndef __x86_64__
> +__attribute__ ((regparm(3)))
> +#endif
> +;
> +
> +void
> +_Noreturn
> +no_return_to_caller (void)
> +{
> +  unsigned i, j, k;
> +  for (i = ARRAY_SIZE; i > 0; --i)
> +for (j = ARRAY_SIZE; j > 0; --j)
> +  for (k = ARRAY_SIZE; k > 0; --k)
> +   array[i - 1][j - 1][k - 1] = value (i, j, k);
> +  while (1);
> +}
> +
> +/* { dg-final { scan-assembler-not "push" } } */
> +/* { dg-final { scan-assembler-not "pop" } } */
> --
> 2.43.2
>


Re: PING: [PATCH] x86-64: Check R_X86_64_CODE_6_GOTTPOFF support

2024-02-23 Thread Uros Bizjak
On Fri, Feb 23, 2024 at 3:45 AM H.J. Lu  wrote:
>
> On Thu, Feb 22, 2024 at 6:39 PM Hongtao Liu  wrote:
> >
> > On Thu, Feb 22, 2024 at 10:33 PM H.J. Lu  wrote:
> > >
> > > On Sun, Feb 18, 2024 at 8:02 AM H.J. Lu  wrote:
> > > >
> > > > If assembler and linker supports
> > > >
> > > > add %reg1, name@gottpoff(%rip), %reg2
> > > >
> > > > with R_X86_64_CODE_6_GOTTPOFF, we can generate it instead of
> > > >
> > > > mov name@gottpoff(%rip), %reg2
> > > > add %reg1, %reg2
> > x86 part LGTM, but I'm not familiar with the changes in config related 
> > files.
>
> Jakub, Uros, Alexandre, can you review the configure.ac change in this patch?
>
> https://patchwork.sourceware.org/project/gcc/list/?series=31075
>
> Thanks.
>
> > > >
> > > > gcc/
> > > >
> > > > * configure.ac (HAVE_AS_R_X86_64_CODE_6_GOTTPOFF): Defined as 1
> > > > if R_X86_64_CODE_6_GOTTPOFF is supported.
> > > > * config.in: Regenerated.
> > > > * configure: Likewise.
> > > > * config/i386/predicates.md (apx_ndd_add_memory_operand): Allow
> > > > UNSPEC_GOTNTPOFF if R_X86_64_CODE_6_GOTTPOFF is supported.
> > > >
> > > > gcc/testsuite/
> > > >
> > > > * gcc.target/i386/apx-ndd-tls-1b.c: New test.
> > > > * lib/target-supports.exp
> > > > (check_effective_target_code_6_gottpoff_reloc): New.
> > > > ---
> > > >  gcc/config.in |  7 +++
> > > >  gcc/config/i386/predicates.md |  6 +-
> > > >  gcc/configure | 62 +++
> > > >  gcc/configure.ac  | 37 +++
> > > >  .../gcc.target/i386/apx-ndd-tls-1b.c  |  9 +++
> > > >  gcc/testsuite/lib/target-supports.exp | 48 ++
> > > >  6 files changed, 168 insertions(+), 1 deletion(-)
> > > >  create mode 100644 gcc/testsuite/gcc.target/i386/apx-ndd-tls-1b.c
> > > >
> > > > diff --git a/gcc/config.in b/gcc/config.in
> > > > index ce1d073833f..f3de4ba6776 100644
> > > > --- a/gcc/config.in
> > > > +++ b/gcc/config.in
> > > > @@ -737,6 +737,13 @@
> > > >  #endif
> > > >
> > > >
> > > > +/* Define 0/1 if your assembler and linker support 
> > > > R_X86_64_CODE_6_GOTTPOFF.
> > > > +   */
> > > > +#ifndef USED_FOR_TARGET
> > > > +#undef HAVE_AS_R_X86_64_CODE_6_GOTTPOFF
> > > > +#endif
> > > > +
> > > > +
> > > >  /* Define if your assembler supports relocs needed by -fpic. */
> > > >  #ifndef USED_FOR_TARGET
> > > >  #undef HAVE_AS_SMALL_PIC_RELOCS
> > > > diff --git a/gcc/config/i386/predicates.md 
> > > > b/gcc/config/i386/predicates.md
> > > > index 4c1aedd7e70..391f108c360 100644
> > > > --- a/gcc/config/i386/predicates.md
> > > > +++ b/gcc/config/i386/predicates.md
> > > > @@ -2299,10 +2299,14 @@ (define_predicate "apx_ndd_memory_operand"
> > > >
> > > >  ;; Return true if OP is a memory operand which can be used in APX NDD
> > > >  ;; ADD with register source operand.  UNSPEC_GOTNTPOFF memory operand
> > > > -;; isn't allowed with APX NDD ADD.
> > > > +;; is allowed with APX NDD ADD only if R_X86_64_CODE_6_GOTTPOFF works.
> > > >  (define_predicate "apx_ndd_add_memory_operand"
> > > >(match_operand 0 "memory_operand")
> > > >  {
> > > > +  /* OK if "add %reg1, name@gottpoff(%rip), %reg2" is supported.  */
> > > > +  if (HAVE_AS_R_X86_64_CODE_6_GOTTPOFF)
> > > > +return true;
> > > > +
> > > >op = XEXP (op, 0);
> > > >
> > > >/* Disallow APX NDD ADD with UNSPEC_GOTNTPOFF.  */
> > > > diff --git a/gcc/configure b/gcc/configure
> > > > index 41b978b0380..c59c971862c 100755
> > > > --- a/gcc/configure
> > > > +++ b/gcc/configure
> > > > @@ -29834,6 +29834,68 @@ cat >>confdefs.h <<_ACEOF
> > > >  _ACEOF
> > > >
> > > >
> > > > +if echo "$ld_ver" | grep GNU > /dev/null; then
> > > > +  if $gcc_cv_ld -V 2>/dev/null | grep elf_x86_64_sol2 > /dev/null; 
> > > > then
> > > > +ld_ix86_gld_64_opt="-melf_x86_64_sol2"
> > > > +  else
> > > > +ld_ix86_gld_64_opt="-melf_x86_64"
> > > > +  fi
> > > > +fi
> > > > +conftest_s='
> > > > +   .text
> > > > +   .globl  _start
> > > > +   .type _start, @function
> > > > +_start:
> > > > +   addq%r23,foo@GOTTPOFF(%rip), %r15
> > > > +   .section .tdata,"awT",@progbits
> > > > +   .type foo, @object
> > > > +foo:
> > > > +   .quad 0'
> > > > +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler for 
> > > > R_X86_64_CODE_6_GOTTPOFF reloc" >&5
> > > > +$as_echo_n "checking assembler for R_X86_64_CODE_6_GOTTPOFF reloc... " 
> > > > >&6; }
> > > > +if ${gcc_cv_as_x86_64_code_6_gottpoff+:} false; then :
> > > > +  $as_echo_n "(cached) " >&6
> > > > +else
> > > > +  gcc_cv_as_x86_64_code_6_gottpoff=no
> > > > +  if test x$gcc_cv_as != x; then
> > > > +$as_echo "$conftest_s" > conftest.s
> > > > +if { ac_try='$gcc_cv_as $gcc_cv_as_flags  -o conftest.o conftest.s 
> > > > >&5'
> > > > +  { { eval echo "\"\$as_me\":${as_lineno-$LINENO}: \"$ac_try\""; } >&5
> > > > +  

[committed] testsuite: Fix a couple of x86 issues in gcc.dg/vect testsuite

2024-02-14 Thread Uros Bizjak
A compile-time test can use -march=skylake-avx512 for all x86 targets,
but a runtime test needs to check avx512f effective target if the
instructions can be assembled.

The runtime test also needs to check if the target machine supports
instruction set we have been compiled for.  The testsuite uses check_vect
infrastructure, but handling of AVX512F+ ISAs was missing there.

Add detection of __AVX512F__ and __AVX512VL__, which is enough to handle
all currently mentioned target processors in the gcc.dg/vect testsuite.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/pr113576.c (dg-additional-options):
Use -march=skylake-avx512 for avx512f effective target.
* gcc.dg/vect/pr98308.c (dg-additional-options):
Use -march=skylake-avx512 for all x86 targets.
* gcc.dg/vect/tree-vect.h (check_vect): Handle __AVX512F__
and __AVX512VL__.

Tested on x86_64-linux-gnu on AVX2 target where the patch prevents
pr113576 runtime failure due to unsupported avx512f instruction.

Uros.
diff --git a/gcc/testsuite/gcc.dg/vect/pr113576.c 
b/gcc/testsuite/gcc.dg/vect/pr113576.c
index decb7abe2f7..b6edde6f8e2 100644
--- a/gcc/testsuite/gcc.dg/vect/pr113576.c
+++ b/gcc/testsuite/gcc.dg/vect/pr113576.c
@@ -1,6 +1,6 @@
 /* { dg-do run } */
 /* { dg-options "-O3" } */
-/* { dg-additional-options "-march=skylake-avx512" { target { x86_64-*-* 
i?86-*-* } } } */
+/* { dg-additional-options "-march=skylake-avx512" { target avx512f } } */
 
 #include "tree-vect.h"
 
diff --git a/gcc/testsuite/gcc.dg/vect/pr98308.c 
b/gcc/testsuite/gcc.dg/vect/pr98308.c
index aeec9771c55..d74431200c7 100644
--- a/gcc/testsuite/gcc.dg/vect/pr98308.c
+++ b/gcc/testsuite/gcc.dg/vect/pr98308.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-additional-options "-O3" } */
-/* { dg-additional-options "-march=skylake-avx512" { target avx512f } } */
+/* { dg-additional-options "-march=skylake-avx512" { target x86_64-*-* 
i?86-*-* } } */
 /* { dg-additional-options "-fdump-tree-optimized-details-blocks" } */
 
 extern unsigned long long int arr_86[];
diff --git a/gcc/testsuite/gcc.dg/vect/tree-vect.h 
b/gcc/testsuite/gcc.dg/vect/tree-vect.h
index c4b81441216..1e4b56ee0e1 100644
--- a/gcc/testsuite/gcc.dg/vect/tree-vect.h
+++ b/gcc/testsuite/gcc.dg/vect/tree-vect.h
@@ -38,7 +38,11 @@ check_vect (void)
 /* Determine what instruction set we've been compiled for, and detect
that we're running with it.  This allows us to at least do a compile
check for, e.g. SSE4.1 when the machine only supports SSE2.  */
-# if defined(__AVX2__)
+# if defined(__AVX512VL__)
+want_level = 7, want_b = bit_AVX512VL;
+# elif defined(__AVX512F__)
+want_level = 7, want_b = bit_AVX512F;
+# elif defined(__AVX2__)
 want_level = 7, want_b = bit_AVX2;
 # elif defined(__AVX__)
 want_level = 1, want_c = bit_AVX;


[committed] i386: psrlq is not used for PERM [PR113871]

2024-02-14 Thread Uros Bizjak
Introduce vec_shl_ and vec_shr_ expanders to improve

'*a = __builtin_shufflevector(*a, (vect64){0}, 1, 2, 3, 4);'

and
'*a = __builtin_shufflevector((vect64){0}, *a, 3, 4, 5, 6);'

shuffles.  The generated code improves from:

movzwl  6(%rdi), %eax
movzwl  4(%rdi), %edx
salq$16, %rax
orq %rdx, %rax
movzwl  2(%rdi), %edx
salq$16, %rax
orq %rdx, %rax
movq%rax, (%rdi)

to:
movq(%rdi), %xmm0
psrlq   $16, %xmm0
movq%xmm0, (%rdi)

and to:
movq(%rdi), %xmm0
psllq   $16, %xmm0
movq%xmm0, (%rdi)

in the second case.

The patch handles 32-bit vectors as well and improves generated code from:

movd(%rdi), %xmm0
pxor%xmm1, %xmm1
punpcklwd   %xmm1, %xmm0
pshuflw $230, %xmm0, %xmm0
movd%xmm0, (%rdi)

to:
movd(%rdi), %xmm0
psrld   $16, %xmm0
movd%xmm0, (%rdi)

and to:
movd(%rdi), %xmm0
pslld   $16, %xmm0
movd%xmm0, (%rdi)

PR target/113871

gcc/ChangeLog:

* config/i386/mmx.md (V248FI): New mode iterator.
(V24FI_32): DItto.
(vec_shl_): New expander.
(vec_shl_): Ditto.
(vec_shr_): Ditto.
(vec_shr_): Ditto.
* config/i386/sse.md (vec_shl_): Simplify expander.
(vec_shr_): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr113871-1a.c: New test.
* gcc.target/i386/pr113871-1b.c: New test.
* gcc.target/i386/pr113871-2a.c: New test.
* gcc.target/i386/pr113871-2b.c: New test.
* gcc.target/i386/pr113871-3a.c: New test.
* gcc.target/i386/pr113871-3b.c: New test.
* gcc.target/i386/pr113871-4a.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 6215b12f05f..075309cca9f 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -84,6 +84,11 @@ (define_mode_iterator V_16_32_64
 (define_mode_iterator V2FI [V2SF V2SI])
 
 (define_mode_iterator V24FI [V2SF V2SI V4HF V4HI])
+
+(define_mode_iterator V248FI [V2SF V2SI V4HF V4HI V8QI])
+
+(define_mode_iterator V24FI_32 [V2HF V2HI V4QI])
+
 ;; Mapping from integer vector mode to mnemonic suffix
 (define_mode_attr mmxvecsize
   [(V8QI "b") (V4QI "b") (V2QI "b")
@@ -3729,6 +3734,70 @@ (define_expand "vv4qi3"
   DONE;
 })
 
+(define_expand "vec_shl_"
+  [(set (match_operand:V248FI 0 "register_operand")
+   (ashift:V1DI
+ (match_operand:V248FI 1 "nonimmediate_operand")
+ (match_operand:DI 2 "nonmemory_operand")))]
+  "TARGET_MMX_WITH_SSE"
+{
+  rtx op0 = gen_reg_rtx (V1DImode);
+  rtx op1 = force_reg (mode, operands[1]);
+
+  emit_insn (gen_mmx_ashlv1di3
+ (op0, gen_lowpart (V1DImode, op1), operands[2]));
+  emit_move_insn (operands[0], gen_lowpart (mode, op0));
+  DONE;
+})
+
+(define_expand "vec_shl_"
+  [(set (match_operand:V24FI_32 0 "register_operand")
+   (ashift:V1SI
+ (match_operand:V24FI_32 1 "nonimmediate_operand")
+ (match_operand:DI 2 "nonmemory_operand")))]
+  "TARGET_SSE2"
+{
+  rtx op0 = gen_reg_rtx (V1SImode);
+  rtx op1 = force_reg (mode, operands[1]);
+
+  emit_insn (gen_mmx_ashlv1si3
+ (op0, gen_lowpart (V1SImode, op1), operands[2]));
+  emit_move_insn (operands[0], gen_lowpart (mode, op0));
+  DONE;
+})
+
+(define_expand "vec_shr_"
+  [(set (match_operand:V248FI 0 "register_operand")
+   (lshiftrt:V1DI
+ (match_operand:V248FI 1 "nonimmediate_operand")
+ (match_operand:DI 2 "nonmemory_operand")))]
+  "TARGET_MMX_WITH_SSE"
+{
+  rtx op0 = gen_reg_rtx (V1DImode);
+  rtx op1 = force_reg (mode, operands[1]);
+
+  emit_insn (gen_mmx_lshrv1di3
+ (op0, gen_lowpart (V1DImode, op1), operands[2]));
+  emit_move_insn (operands[0], gen_lowpart (mode, op0));
+  DONE;
+})
+
+(define_expand "vec_shr_"
+  [(set (match_operand:V24FI_32 0 "register_operand")
+   (lshiftrt:V1SI
+ (match_operand:V24FI_32 1 "nonimmediate_operand")
+ (match_operand:DI 2 "nonmemory_operand")))]
+  "TARGET_SSE2"
+{
+  rtx op0 = gen_reg_rtx (V1SImode);
+  rtx op1 = force_reg (mode, operands[1]);
+
+  emit_insn (gen_mmx_lshrv1si3
+ (op0, gen_lowpart (V1SImode, op1), operands[2]));
+  emit_move_insn (operands[0], gen_lowpart (mode, op0));
+  DONE;
+})
+
 ;
 ;;
 ;; Parallel integral comparisons
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index acd10908d76..1bc614ab702 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -16498,29 +16498,35 @@ (define_split
   "operands[3] = XVECEXP (operands[2], 0, 0);")
 
 (define_expand "vec_shl_"
-  [(set (match_dup 3)
+  [(set (match_operand:V_128 0 "register_operand")
(ashift:V1TI
-(match_operand:V_128 1 "register_operand")
-(match_operand:SI 2 "const_0_to_255_mul_8_operand")))
-   (set (match_operand:V_128 0 "register_operand") (match_dup 4))]
+(match_operand:V_128 1 

Re: [PATCH v6] x86-64: Find a scratch register for large model profiling

2024-02-05 Thread Uros Bizjak
On Mon, Feb 5, 2024 at 5:43 PM H.J. Lu  wrote:
>
> Changes in v6:
>
> 1. Use ix86_save_reg and accessible_reg_set in
> x86_64_select_profile_regnum.
> 2. Construct a complete reg name in x86_function_profiler.
>
> Changes in v5:
>
> 1. Add pr113689-3.c.
> 2. Use %r10 if ix86_profile_before_prologue () return true.
> 3. Try a callee-saved register which has been saved on stack in the
> prologue.
>
> Changes in v4:
>
> 1. Remove pr113689-3.c.
> 2. Use df_get_live_out.
>
> Changes in v3:
>
> 1. Remove r10_ok.
>
> Changes in v2:
>
> 1. Add int_parameter_registers to machine_function to track integer
> registers used for parameter passing.
> 2. Update x86_64_select_profile_regnum to try %r10 first and use an
> caller-saved register, which isn't used for parameter passing.
>
> ---
> 2 scratch registers, %r10 and %r11, are available at function entry for
> large model profiling.  But %r10 may be used by stack realignment and we
> can't use %r10 in this case.  Add x86_64_select_profile_regnum to find
> a caller-saved register which isn't live or a callee-saved register
> which has been saved on stack in the prologue at entry for large model
> profiling and sorry if we can't find one.
>
> gcc/
>
> PR target/113689
> * config/i386/i386.cc (x86_64_select_profile_regnum): New.
> (x86_function_profiler): Call x86_64_select_profile_regnum to
> get a scratch register for large model profiling.
>
> gcc/testsuite/
>
> PR target/113689
> * gcc.target/i386/pr113689-1.c: New file.
> * gcc.target/i386/pr113689-2.c: Likewise.
> * gcc.target/i386/pr113689-3.c: Likewise.
> ---
>  gcc/config/i386/i386.cc| 91 ++
>  gcc/testsuite/gcc.target/i386/pr113689-1.c | 49 
>  gcc/testsuite/gcc.target/i386/pr113689-2.c | 41 ++
>  gcc/testsuite/gcc.target/i386/pr113689-3.c | 48 
>  4 files changed, 214 insertions(+), 15 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr113689-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr113689-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr113689-3.c
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index b3e7c74846e..08aad32af85 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -22749,6 +22749,48 @@ current_fentry_section (const char **name)
>return true;
>  }
>
> +/* Return a caller-saved register which isn't live or a callee-saved
> +   register which has been saved on stack in the prologue at entry for
> +   profile.  */
> +
> +static int
> +x86_64_select_profile_regnum (bool r11_ok ATTRIBUTE_UNUSED)
> +{
> +  /* Use %r10 if the profiler is emitted before the prologue or it isn't
> + used by DRAP.  */
> +  if (ix86_profile_before_prologue ()
> +  || !crtl->drap_reg
> +  || REGNO (crtl->drap_reg) != R10_REG)
> +return R10_REG;
> +
> +  /* The profiler is emitted after the prologue.  If there is a
> + caller-saved register which isn't live or a callee-saved
> + register saved on stack in the prologue, use it.  */
> +
> +  bitmap reg_live = df_get_live_out (ENTRY_BLOCK_PTR_FOR_FN (cfun));
> +
> +  int i;
> +  for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
> +if (GENERAL_REGNO_P (i)
> +   && i != R10_REG
> +#ifdef NO_PROFILE_COUNTERS
> +   && (r11_ok || i != R11_REG)
> +#else
> +   && i != R11_REG
> +#endif
> +   && TEST_HARD_REG_BIT (accessible_reg_set, i)
> +   && !fixed_regs[i]
> +   && (ix86_save_reg (i, true, true)
> +   || (call_used_regs[i]
> +   && !REGNO_REG_SET_P (reg_live, i
> +  return i;

ix86_save_reg will never save fixed regs, so the above can be optimized a bit:

   && TEST_HARD_REG_BIT (accessible_reg_set, i)
   && (ix86_save_reg (i, true, true)
   || (call_used_regs[i] && !fixed_regs[i]
   && !REGNO_REG_SET_P (reg_live, i

OK with the above change.

Thanks,
Uros.

> +  sorry ("no register available for profiling %<-mcmodel=large%s%>",
> +ix86_cmodel == CM_LARGE_PIC ? " -fPIC" : "");
> +
> +  return INVALID_REGNUM;
> +}
> +
>  /* Output assembler code to FILE to increment profiler label # LABELNO
> for profiling a function entry.  */
>  void
> @@ -22783,42 +22825,61 @@ x86_function_profiler (FILE *file, int labelno 
> ATTRIBUTE_UNUSED)
> fprintf (file, "\tleaq\t%sP%d(%%rip), %%r11\n", LPREFIX, labelno);
>  #endif
>
> +  int scratch;
> +  const char *reg;
> +  char legacy_reg[4] = { 0 };
> +
>if (!TARGET_PECOFF)
> {
>   switch (ix86_cmodel)
> {
> case CM_LARGE:
> - /* NB: R10 is caller-saved.  Although it can be used as a
> -static chain register, it is preserved when calling
> -mcount for nested functions.  */
> + scratch = x86_64_select_profile_regnum (true);
> + reg = hi_reg_name[scratch];
> + if 

Re: [PATCH v5] x86-64: Find a scratch register for large model profiling

2024-02-05 Thread Uros Bizjak
On Fri, Feb 2, 2024 at 11:47 PM H.J. Lu  wrote:
>
> Changes in v5:
>
> 1. Add pr113689-3.c.
> 2. Use %r10 if ix86_profile_before_prologue () return true.
> 3. Try a callee-saved register which has been saved on stack in the
> prologue.
>
> Changes in v4:
>
> 1. Remove pr113689-3.c.
> 2. Use df_get_live_out.
>
> Changes in v3:
>
> 1. Remove r10_ok.
>
> Changes in v2:
>
> 1. Add int_parameter_registers to machine_function to track integer
> registers used for parameter passing.
> 2. Update x86_64_select_profile_regnum to try %r10 first and use an
> caller-saved register, which isn't used for parameter passing.
>
> ---
> 2 scratch registers, %r10 and %r11, are available at function entry for
> large model profiling.  But %r10 may be used by stack realignment and we
> can't use %r10 in this case.  Add x86_64_select_profile_regnum to find
> a caller-saved register which isn't live or a callee-saved register
> which has been saved on stack in the prologue at entry for large model
> profiling and sorry if we can't find one.
>
> gcc/
>
> PR target/113689
> * config/i386/i386.cc (set_saved_int_registers_bit): New.
> (test_saved_int_registers_bit): Likewise.
> (ix86_emit_save_regs): Call set_saved_int_registers_bit on
> saved register.
> (ix86_emit_save_regs_using_mov): Likewise.
> (x86_64_select_profile_regnum): New.
> (x86_function_profiler): Call x86_64_select_profile_regnum to
> get a scratch register for large model profiling.
> * config/i386/i386.h (machine_function): Add
> saved_int_registers.
>
> gcc/testsuite/
>
> PR target/113689
> * gcc.target/i386/pr113689-1.c: New file.
> * gcc.target/i386/pr113689-2.c: Likewise.
> * gcc.target/i386/pr113689-3.c: Likewise.
> ---
>  gcc/config/i386/i386.cc| 119 ++---
>  gcc/config/i386/i386.h |   5 +
>  gcc/testsuite/gcc.target/i386/pr113689-1.c |  49 +
>  gcc/testsuite/gcc.target/i386/pr113689-2.c |  41 +++
>  gcc/testsuite/gcc.target/i386/pr113689-3.c |  48 +
>  5 files changed, 247 insertions(+), 15 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr113689-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr113689-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr113689-3.c
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index b3e7c74846e..1c7aaa4535e 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -7387,6 +7387,32 @@ choose_baseaddr (HOST_WIDE_INT cfa_offset, unsigned 
> int *align,
>return plus_constant (Pmode, base_reg, base_offset);
>  }
>
> +/* Set the integer register REGNO bit in saved_int_registers.  */
> +
> +static void
> +set_saved_int_registers_bit (int regno)
> +{
> +  if (LEGACY_INT_REGNO_P (regno))
> +cfun->machine->saved_int_registers |= 1 << regno;
> +  else
> +cfun->machine->saved_int_registers
> +  |= 1 << (regno - FIRST_REX_INT_REG + 8);
> +}
> +
> +/* Return true if the integer register REGNO bit in saved_int_registers
> +   is set.  */
> +
> +static bool
> +test_saved_int_registers_bit (int regno)
> +{
> +  if (LEGACY_INT_REGNO_P (regno))
> +return (cfun->machine->saved_int_registers
> +   & (1 << regno)) != 0;
> +  else
> +return (cfun->machine->saved_int_registers
> +   & (1 << (regno - FIRST_REX_INT_REG + 8))) != 0;
> +}
> +
>  /* Emit code to save registers in the prologue.  */
>
>  static void
> @@ -7403,6 +7429,7 @@ ix86_emit_save_regs (void)
> insn = emit_insn (gen_push (gen_rtx_REG (word_mode, regno),
> TARGET_APX_PPX));
> RTX_FRAME_RELATED_P (insn) = 1;
> +   set_saved_int_registers_bit (regno);
>   }
>  }
>else
> @@ -7415,6 +7442,7 @@ ix86_emit_save_regs (void)
>for (regno = FIRST_PSEUDO_REGISTER - 1; regno >= 0; regno--)
> if (GENERAL_REGNO_P (regno) && ix86_save_reg (regno, true, true))
>   {
> +   set_saved_int_registers_bit (regno);
> if (aligned)
>   {
> regno_list[loaded_regnum++] = regno;
> @@ -7567,6 +7595,7 @@ ix86_emit_save_regs_using_mov (HOST_WIDE_INT cfa_offset)
>{
>  ix86_emit_save_reg_using_mov (word_mode, regno, cfa_offset);
> cfa_offset -= UNITS_PER_WORD;
> +   set_saved_int_registers_bit (regno);
>}
>  }

Do we really need the above handling? I think that we can use
ix86_save_reg directly in x86_64_select_profile_regnum below.

> @@ -22749,6 +22778,48 @@ current_fentry_section (const char **name)
>return true;
>  }
>
> +/* Return a caller-saved register which isn't live or a callee-saved
> +   register which has been saved on stack in the prologue at entry for
> +   profile.  */
> +
> +static int
> +x86_64_select_profile_regnum (bool r11_ok ATTRIBUTE_UNUSED)
> +{
> +  /* Use %r10 if the profiler is emitted before 

Re: [x86_64 PATCH] PR target/113690: Fix-up MULT REG_EQUAL notes in STV.

2024-02-05 Thread Uros Bizjak
On Mon, Feb 5, 2024 at 9:06 AM Uros Bizjak  wrote:
>
> On Mon, Feb 5, 2024 at 1:24 AM Roger Sayle  wrote:
> >
> >
> > This patch fixes PR target/113690, an ICE-on-valid regression on x86_64
> > that exhibits with a specific combination of command line options.  The
> > cause is that x86's scalar-to-vector pass converts a chain of instructions
> > from TImode to V1TImode, but fails to appropriately update the attached
> > REG_EQUAL note.  Given that multiplication isn't supported in V1TImode,
> > the REG_NOTE handling code wasn't expecting to see a MULT.  Easily solved
> > with additional handling for other binary operators that may potentially
> > (in future) have an immediate constant as the second operand that needs
> > handling.  For convenience, this code (re)factors the logic to convert
> > a TImode constant into a V1TImode constant vector into a subroutine and
> > reuses it.
> >
> > For the record, STV is actually doing something useful in this strange
> > testcase,  GCC with -O2 -fno-dce -fno-forward-propagate
> > -fno-split-wide-types
> > -funroll-loops generates:
> >
> > foo:movl$v, %eax
> > pxor%xmm0, %xmm0
> > movaps  %xmm0, 48(%rax)
> > movaps  %xmm0, (%rax)
> > movaps  %xmm0, 16(%rax)
> > movaps  %xmm0, 32(%rax)
> > ret
> >
> > With the addition of -mno-stv (to disable the patched code) it gives:
> >
> > foo:movl$v, %eax
> > movq$0, 48(%rax)
> > movq$0, 56(%rax)
> > movq$0, (%rax)
> > movq$0, 8(%rax)
> > movq$0, 16(%rax)
> > movq$0, 24(%rax)
> > movq$0, 32(%rax)
> > movq$0, 40(%rax)
> > ret
> >
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32}
> > with no new failures.  Ok for mainline?
> >
> >
> > 2024-02-05  Roger Sayle  
> >
> > gcc/ChangeLog
> > PR target/113690
> > * config/i386/i386-features.cc (timode_convert_cst): New helper
> > function to convert a TImode CONST_SCALAR_INT_P to a V1TImode
> > CONST_VECTOR.
> > (timode_scalar_chain::convert_op): Use timode_convert_cst.
> > (timode_scalar_chain::convert_insn): If a REG_EQUAL note contains
> > a binary operator where the second operand is an immediate integer
> > constant, convert it to V1TImode using timode_convert_cst.
> > Use timode_convert_cst.
> >
> > gcc/testsuite/ChangeLog
> > PR target/113690
> > * gcc.target/i386/pr113690.c: New test case.
>
> OK.

OTOH, how about we follow the approach from
general_scalar_chain::convert_insn and just kill the note?

Uros.


Re: [PATCH] i386: Clear REG_UNUSED and REG_DEAD notes from the IL at the end of vzeroupper pass [PR113059]

2024-02-05 Thread Uros Bizjak
On Wed, Jan 31, 2024 at 9:23 AM Jakub Jelinek  wrote:
>
> Hi!
>
> The move of the vzeroupper pass from after reload pass to after
> postreload_cse helped only partially, CSE-like passes can still invalidate
> those notes (especially REG_UNUSED) if they use some earlier register
> holding some value later on in the IL.
>
> So, either we could try to move it one pass further after gcse2 and hope
> no later pass invalidates the notes, or the following patch attempts to
> restore the REG_DEAD/REG_UNUSED state from GCC 13 and earlier, where
> the LRA or reload passes remove all REG_DEAD/REG_UNUSED notes and the notes
> reappear only at the start of dse2 pass when it calls
>   df_note_add_problem ();
>   df_analyze ();
> So, effectively
>   NEXT_PASS (pass_postreload_cse);
>   NEXT_PASS (pass_gcse2);
>   NEXT_PASS (pass_split_after_reload);
>   NEXT_PASS (pass_ree);
>   NEXT_PASS (pass_compare_elim_after_reload);
>   NEXT_PASS (pass_thread_prologue_and_epilogue);
> passes operate without those notes in the IL.
> While in GCC 14 mode switching computes the notes problem at the start of
> vzeroupper, the patch below removes them at the end of the pass again, so
> that the above passes continue to operate without them.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2024-01-31  Jakub Jelinek  
>
> PR target/113059
> * config/i386/i386-features.cc (rest_of_handle_insert_vzeroupper):
> Remove REG_DEAD/REG_UNUSED notes at the end of the pass before
> df_analyze call.

Not really a review, but let's rubber stamp this workaround OK.

Thanks,
Uros.

>
> --- gcc/config/i386/i386-features.cc.jj 2024-01-08 12:15:13.611477047 +0100
> +++ gcc/config/i386/i386-features.cc2024-01-30 12:36:27.834515803 +0100
> @@ -2664,6 +2664,32 @@ rest_of_handle_insert_vzeroupper (void)
>/* Call optimize_mode_switching.  */
>g->get_passes ()->execute_pass_mode_switching ();
>
> +  /* LRA removes all REG_DEAD/REG_UNUSED notes and normally they
> + reappear in the IL only at the start of pass_rtl_dse2, which does
> + df_note_add_problem (); df_analyze ();
> + The vzeroupper is scheduled after postreload_cse pass and mode
> + switching computes the notes as well, the problem is that e.g.
> + pass_gcse2 doesn't maintain the notes, see PR113059 and
> + PR112760.  Remove the notes now to restore status quo ante
> + until we figure out how to maintain the notes or what else
> + to do.  */
> +  basic_block bb;
> +  rtx_insn *insn;
> +  FOR_EACH_BB_FN (bb, cfun)
> +FOR_BB_INSNS (bb, insn)
> +  if (NONDEBUG_INSN_P (insn))
> +   {
> + rtx *pnote = _NOTES (insn);
> + while (*pnote != 0)
> +   {
> + if (REG_NOTE_KIND (*pnote) == REG_DEAD
> + || REG_NOTE_KIND (*pnote) == REG_UNUSED)
> +   *pnote = XEXP (*pnote, 1);
> + else
> +   pnote =  (*pnote, 1);
> +   }
> +   }
> +
>df_analyze ();
>return 0;
>  }
>
> Jakub
>


Re: [x86_64 PATCH] PR target/113690: Fix-up MULT REG_EQUAL notes in STV.

2024-02-05 Thread Uros Bizjak
On Mon, Feb 5, 2024 at 1:24 AM Roger Sayle  wrote:
>
>
> This patch fixes PR target/113690, an ICE-on-valid regression on x86_64
> that exhibits with a specific combination of command line options.  The
> cause is that x86's scalar-to-vector pass converts a chain of instructions
> from TImode to V1TImode, but fails to appropriately update the attached
> REG_EQUAL note.  Given that multiplication isn't supported in V1TImode,
> the REG_NOTE handling code wasn't expecting to see a MULT.  Easily solved
> with additional handling for other binary operators that may potentially
> (in future) have an immediate constant as the second operand that needs
> handling.  For convenience, this code (re)factors the logic to convert
> a TImode constant into a V1TImode constant vector into a subroutine and
> reuses it.
>
> For the record, STV is actually doing something useful in this strange
> testcase,  GCC with -O2 -fno-dce -fno-forward-propagate
> -fno-split-wide-types
> -funroll-loops generates:
>
> foo:movl$v, %eax
> pxor%xmm0, %xmm0
> movaps  %xmm0, 48(%rax)
> movaps  %xmm0, (%rax)
> movaps  %xmm0, 16(%rax)
> movaps  %xmm0, 32(%rax)
> ret
>
> With the addition of -mno-stv (to disable the patched code) it gives:
>
> foo:movl$v, %eax
> movq$0, 48(%rax)
> movq$0, 56(%rax)
> movq$0, (%rax)
> movq$0, 8(%rax)
> movq$0, 16(%rax)
> movq$0, 24(%rax)
> movq$0, 32(%rax)
> movq$0, 40(%rax)
> ret
>
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
>
> 2024-02-05  Roger Sayle  
>
> gcc/ChangeLog
> PR target/113690
> * config/i386/i386-features.cc (timode_convert_cst): New helper
> function to convert a TImode CONST_SCALAR_INT_P to a V1TImode
> CONST_VECTOR.
> (timode_scalar_chain::convert_op): Use timode_convert_cst.
> (timode_scalar_chain::convert_insn): If a REG_EQUAL note contains
> a binary operator where the second operand is an immediate integer
> constant, convert it to V1TImode using timode_convert_cst.
> Use timode_convert_cst.
>
> gcc/testsuite/ChangeLog
> PR target/113690
> * gcc.target/i386/pr113690.c: New test case.

OK.

Thanks,
Uros.

>
>
> Thanks in advance,
> Roger
> --
>


Re: [PATCH] testsuite: i386: Fix gcc.target/i386/pr71321.c on Solaris/x86

2024-02-02 Thread Uros Bizjak
On Fri, Feb 2, 2024 at 9:59 AM Rainer Orth  
wrote:
>
> gcc.target/i386/pr71321.c FAILs on 64-bit Solaris/x86 with the native
> assembler:
>
> FAIL: gcc.target/i386/pr71321.c scan-assembler-not lea.*0
>
> The problem is that /bin/as doesn't fully support cfi directives, so the
> .eh_frame section is specified explicitly, which includes ".long 0".
> The regular expression above includes ".*", which does multiline
> matches.  AFAICS those aren't needed here.
>
> This patch changes the RE not to use multiline patches.
>
> Tested on i386-pc-solaris2.11 (as and gas) and i686-pc-linux-gnu.
>
> Ok for trunk?

OK.

Thanks,
Uros.

>
> Rainer
>
> --
> -
> Rainer Orth, Center for Biotechnology, Bielefeld University
>
>
> 2024-02-01  Rainer Orth  
>
> gcc/testsuite:
> * gcc.target/i386/pr71321.c (scan-assembler-not): Avoid multiline
> matches.
>


[committed] i386: Improve *cmp_doubleword splitter [PR113701]

2024-02-01 Thread Uros Bizjak
The fix for PR70321 introduced a splitter that split a doubleword
comparison into a pair of XORs followed by an IOR to set the (zero)
flags register.  To help the reload, splitter forced SUBREG pieces of
double-word input values to a pseudo, but this regressed
gcc.target/i386/pr82580.c

int f0 (U x, U y) { return x == y; }

from:
xorq%rdx, %rdi
xorq%rcx, %rsi
xorl%eax, %eax
orq %rsi, %rdi
sete%al
ret

to:
xchgq   %rdi, %rsi
movq%rdx, %r8
movq%rcx, %rax
movq%rsi, %rdx
movq%rdi, %rcx
xorq%rax, %rcx
xorq%r8, %rdx
xorl%eax, %eax
orq %rcx, %rdx
sete%al
ret

To mitigate the regression, remove this legacy heuristic (workaround?).
There have been many incremental changes and improvements to x86 TImode
and register allocation, so this legacy workaround is not only no longer
useful, but it actually hurts register allocation.  The patched compiler
now produces:

xchgq   %rdi, %rsi
xorl%eax, %eax
xorq%rsi, %rdx
xorq%rdi, %rcx
orq %rcx, %rdx
sete%al
ret

PR target/113701

gcc/ChangeLog:

* config/i386/i386.md (*cmp_doubleword):
Do not force SUBREG pieces to pseudos.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index bac0a6ade67..a82f2e456fe 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -1632,10 +1632,6 @@ (define_insn_and_split "*cmp_doubleword"
  (set (match_dup 4) (ior:DWIH (match_dup 4) (match_dup 5)))])]
 {
   split_double_mode (mode, [0], 2, [0], [2]);
-  /* Placing the SUBREG pieces in pseudos helps reload.  */
-  for (int i = 0; i < 4; i++)
-if (SUBREG_P (operands[i]))
-  operands[i] = force_reg (mode, operands[i]);
 
   operands[4] = gen_reg_rtx (mode);
 


Re: [PATCH 1/2] target/113255 - avoid REG_POINTER on a pointer difference

2024-02-01 Thread Uros Bizjak
On Thu, Feb 1, 2024 at 3:18 PM Richard Biener  wrote:
>
> The following avoids re-using a register holding a pointer (and
> thus might be REG_POINTER) for the result of a pointer difference
> computation.  That might confuse heuristics in (broken) RTL alias
> analysis which relies on REG_POINTER indicating that we're
> dealing with one.
>
> This alone doesn't fix anything.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, OK for trunk
> and branches (as necessary)?

LGTM, also for branches.

Thanks,
Uros.

>
> Thanks,
> Richard.
>
> PR target/113255
> * config/i386/i386-expand.cc
> (expand_set_or_cpymem_prologue_epilogue_by_misaligned_moves):
> Use a new pseudo for the skipped number of bytes.
> ---
>  gcc/config/i386/i386-expand.cc | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
> index 0d817fc3f3b..26c48e8b0c8 100644
> --- a/gcc/config/i386/i386-expand.cc
> +++ b/gcc/config/i386/i386-expand.cc
> @@ -8090,7 +8090,7 @@ 
> expand_set_or_cpymem_prologue_epilogue_by_misaligned_moves (rtx destmem, rtx 
> src
>/* See how many bytes we skipped.  */
>saveddest = expand_simple_binop (GET_MODE (*destptr), MINUS, saveddest,
>*destptr,
> -  saveddest, 1, OPTAB_DIRECT);
> +  NULL_RTX, 1, OPTAB_DIRECT);
>/* Adjust srcptr and count.  */
>if (!issetmem)
> *srcptr = expand_simple_binop (GET_MODE (*srcptr), MINUS, *srcptr,
> --
> 2.35.3
>


Re: [PATCH] testsuite: i386: Fix gcc.target/i386/no-callee-saved-1.c etc. on Solaris/x86

2024-01-31 Thread Uros Bizjak
On Wed, Jan 31, 2024 at 1:57 PM Rainer Orth  
wrote:
>
> The gcc.target/i386/no-callee-saved-[12].c tests FAIL on Solaris/x86:
>
> FAIL: gcc.target/i386/no-callee-saved-1.c scan-assembler-not push
> FAIL: gcc.target/i386/no-callee-saved-2.c scan-assembler-not push
>
> In both cases, the test expect the Linux/x86 default of
> -fomit-frame-pointer, while Solaris/x86 defaults to
> -fno-omit-frame-pointer.
>
> So this patch explicitly specifies -fomit-frame-pointer.
>
> Tested on i386-pc-solaris2.11 (as and gas) and i686-pc-linux-gnu.
>
> Ok for trunk?

OK.

Thanks,
Uros.

>
> Rainer
>
> --
> -
> Rainer Orth, Center for Biotechnology, Bielefeld University
>
>
> 2024-01-30  Rainer Orth  
>
> gcc/testsuite:
> * gcc.target/i386/no-callee-saved-1.c: Add -fomit-frame-pointer to
> dg-options.
> * gcc.target/i386/no-callee-saved-2.c: Likewise.
>


Re: [PATCH] testsuite: i386: Fix gcc.target/i386/pr38534-1.c etc. on Solaris/x86

2024-01-31 Thread Uros Bizjak
On Wed, Jan 31, 2024 at 2:02 PM Rainer Orth  
wrote:
>
> The gcc.target/i386/pr38534-1.c etc. tests FAIL on 32 and 64-bit
> Solaris/x86:
>
> FAIL: gcc.target/i386/pr38534-1.c scan-assembler-not push
> FAIL: gcc.target/i386/pr38534-2.c scan-assembler-not push
> FAIL: gcc.target/i386/pr38534-3.c scan-assembler-not push
> FAIL: gcc.target/i386/pr38534-4.c scan-assembler-not push
>
> The tests assume the Linux/x86 default of -fomit-frame-pointer, while
> Solaris/x86 defaults to -fno-omit-frame-pointer.
>
> Fixed by specifying -fomit-frame-pointer explicitly.
>
> Tested on i386-pc-solaris2.11 and i686-pc-linux-gnu.
>
> Ok for trunk?

OK.

Thanks,
Uros.

>
> Rainer
>
> --
> -
> Rainer Orth, Center for Biotechnology, Bielefeld University
>
>
> 2024-01-30  Rainer Orth  
>
> gcc/testsuite:
> * gcc.target/i386/pr38534-1.c: Add -fomit-frame-pointer to
> dg-options.
> * gcc.target/i386/pr38534-2.c: Likewise.
> * gcc.target/i386/pr38534-3.c: Likewise.
> * gcc.target/i386/pr38534-4.c: Likewise.
>


Re: Unreviewed patches

2024-01-31 Thread Uros Bizjak
On Wed, Jan 31, 2024 at 3:04 PM Rainer Orth  
wrote:
>
> Three patches have remained unreviewed for a week or more:
>
> c++: Fix g++.dg/ext/attr-section2.C etc. with Solaris/SPARC as
> https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643434.html
>
> This one may even be obvious.
>
> testsuite: i386: Fix gcc.target/i386/pr70321.c on 32-bit Solaris/x86
> https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643771.html
>
> testsuite: i386: Fix gcc.target/i386/avx512vl-stv-rotatedi-1.c on 
> 32-bit Solaris/x86
> https://gcc.gnu.org/pipermail/gcc-patches/2024-January/643774.html
>
> Those two require an x86 maintainer.

OK for x86 patches, I'd say that these two fall under Solaris
maintainership (if not obvious, after all).

Thanks,
Uros.


Re: [PATCH] testsuite: i386: Fix gcc.target/i386/pr80833-1.c on 32-bit Solaris/x86

2024-01-24 Thread Uros Bizjak
On Wed, Jan 24, 2024 at 10:07 AM Rainer Orth
 wrote:
>
> gcc.target/i386/pr80833-1.c FAILs on 32-bit Solaris/x86 since 20220609:
>
> FAIL: gcc.target/i386/pr80833-1.c scan-assembler pextrd
>
> Unlike e.g. Linux/i686, 32-bit Solaris/x86 defaults to -mstackrealign,
> so this patch overrides that to match.
>
> Tested on i386-pc-solaris2.11 and i686-pc-linux-gnu.
>
> Ok for trunk?
>
> Rainer
>
> --
> -
> Rainer Orth, Center for Biotechnology, Bielefeld University
>
>
> 2024-01-23  Rainer Orth  
>
> gcc/testsuite:
> * gcc.target/i386/pr80833-1.c: Add -mno-stackrealign to dg-options.

OK.

Thanks,
Uros.


Re: [middle-end PATCH] Prefer PLUS over IOR in RTL expansion of multi-word shifts/rotates.

2024-01-20 Thread Uros Bizjak
On Fri, Jan 19, 2024 at 5:50 PM Jeff Law  wrote:
>
>
>
> On 1/19/24 09:05, Georg-Johann Lay wrote:
> >
> >
> > Am 18.01.24 um 20:54 schrieb Roger Sayle:
> >>
> >> This patch tweaks RTL expansion of multi-word shifts and rotates to use
> >> PLUS rather than IOR for disjunctive operations.  During expansion of
> >> these operations, the middle-end creates RTL like (X<>C2)
> >> where the constants C1 and C2 guarantee that bits don't overlap.
> >> Hence the IOR can be performed by any any_or_plus operation, such as
> >> IOR, XOR or PLUS; for word-size operations where carry chains aren't
> >> an issue these should all be equally fast (single-cycle) instructions.
> >> The benefit of this change is that targets with shift-and-add insns,
> >> like x86's lea, can benefit from the LSHIFT-ADD form.
> >>
> >> An example of a backend that benefits is ARC, which is demonstrated
> >> by these two simple functions:
> >
> > But there are also back-ends where this is bad.
> >
> > The reason is that with ORI, the back-end needs only to operate no
> > these sub-words where the sub-mask is non-zero.  But for PLUS this
> > is not the case because the back-end does not know that intermediate
> > carry will be zero.  Hence, with PLUS, more instructions are needed.
> > An example is AVR, but maybe much more target with multi-word operations
> > are affected in a bad way.
> >
> > Take for example the case with 2 words and a value of 1.
> >
> > LO |= 1
> > HI |= 0
> >
> > can be optimized to
> >
> > LO |= 1
> >
> > but for addition this is not the case:
> >
> > LO += 1
> > HI +=c 0 ;; Does not know that always carry = 0.
> I think it's clear that the decision is target and possibly uarch
> specific within a target.
>
> Which means that expmed is probably the right place and that we're going
> to need to look for a good way for the target to control.  I suspect
> rtx_cost  isn't likely a good fit.

Perhaps related is PR108477 [1] and patch at [2], where x86 would
prefer PLUS instead of {X,I}OR, where we have disjoint bits in the
operands of {X,I}OR.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108477
[2] https://gcc.gnu.org/pipermail/gcc-patches/2024-January/642164.html

Uros.


Re: [PATCH] i386: Add -masm=intel profiling support [PR113122]

2024-01-18 Thread Uros Bizjak
On Thu, Jan 18, 2024 at 8:31 AM Jakub Jelinek  wrote:
>
> Hi!
>
> x86_function_profiler emits assembly directly into file and only emits
> AT syntax.  The following patch adjusts it to emit MASM syntax
> if -masm=intel.
> As it doesn't use asm_fprintf, I can't use {|} syntax for the dialects.
>
> I've tested using
> for i in -mcmodel=large "-mcmodel=large -fpic" "" -fpic "-m32 -fpic" "-m32"; 
> do
> ./xgcc -B ./ -c -O2 -fprofile $i -masm=att pr113122.c -o pr113122.o1;
> ./xgcc -B ./ -c -O2 -fprofile $i -masm=intel pr113122.c -o pr113122.o2;
> objdump -dr pr113122.o1 > /tmp/1; objdump -dr pr113122.o2 > /tmp/2;
> diff -up /tmp/1 /tmp/2; done
> that the emitted sequences are identical after assembly.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2024-01-18  Jakub Jelinek  
>
> PR target/113122
> * config/i386/i386.cc (x86_function_profiler): Add -masm=intel
> support.  Add missing space after , in emitted assembly in some
> cases.  Formatting fixes.
>
> * gcc.target/i386/pr113122-1.c: New test.
> * gcc.target/i386/pr113122-2.c: New test.
> * gcc.target/i386/pr113122-3.c: New test.
> * gcc.target/i386/pr113122-4.c: New test.

LGTM.

Thanks,
Uros.

>
> --- gcc/config/i386/i386.cc.jj  2024-01-05 15:22:21.810685516 +0100
> +++ gcc/config/i386/i386.cc 2024-01-17 16:52:48.026177278 +0100
> @@ -22746,7 +22746,10 @@ x86_function_profiler (FILE *file, int l
>if (TARGET_64BIT)
>  {
>  #ifndef NO_PROFILE_COUNTERS
> -  fprintf (file, "\tleaq\t%sP%d(%%rip),%%r11\n", LPREFIX, labelno);
> +  if (ASSEMBLER_DIALECT == ASM_INTEL)
> +   fprintf (file, "\tlea\tr11, %sP%d[rip]\n", LPREFIX, labelno);
> +  else
> +   fprintf (file, "\tleaq\t%sP%d(%%rip), %%r11\n", LPREFIX, labelno);
>  #endif
>
>if (!TARGET_PECOFF)
> @@ -22757,12 +22760,29 @@ x86_function_profiler (FILE *file, int l
>   /* NB: R10 is caller-saved.  Although it can be used as a
>  static chain register, it is preserved when calling
>  mcount for nested functions.  */
> - fprintf (file, "1:\tmovabsq\t$%s, %%r10\n\tcall\t*%%r10\n",
> -  mcount_name);
> + if (ASSEMBLER_DIALECT == ASM_INTEL)
> +   fprintf (file, "1:\tmovabs\tr10, OFFSET FLAT:%s\n"
> +  "\tcall\tr10\n", mcount_name);
> + else
> +   fprintf (file, "1:\tmovabsq\t$%s, %%r10\n\tcall\t*%%r10\n",
> +mcount_name);
>   break;
> case CM_LARGE_PIC:
>  #ifdef NO_PROFILE_COUNTERS
> - fprintf (file, "1:\tmovabsq\t$_GLOBAL_OFFSET_TABLE_-1b, 
> %%r11\n");
> + if (ASSEMBLER_DIALECT == ASM_INTEL)
> +   {
> + fprintf (file, "1:movabs\tr11, "
> +"OFFSET FLAT:_GLOBAL_OFFSET_TABLE_-1b\n");
> + fprintf (file, "\tlea\tr10, 1b[rip]\n");
> + fprintf (file, "\tadd\tr10, r11\n");
> + fprintf (file, "\tmovabs\tr11, OFFSET FLAT:%s@PLTOFF\n",
> +  mcount_name);
> + fprintf (file, "\tadd\tr10, r11\n");
> + fprintf (file, "\tcall\tr10\n");
> + break;
> +   }
> + fprintf (file,
> +  "1:\tmovabsq\t$_GLOBAL_OFFSET_TABLE_-1b, %%r11\n");
>   fprintf (file, "\tleaq\t1b(%%rip), %%r10\n");
>   fprintf (file, "\taddq\t%%r11, %%r10\n");
>   fprintf (file, "\tmovabsq\t$%s@PLTOFF, %%r11\n", mcount_name);
> @@ -22776,7 +22796,12 @@ x86_function_profiler (FILE *file, int l
> case CM_MEDIUM_PIC:
>   if (!ix86_direct_extern_access)
> {
> - fprintf (file, "1:\tcall\t*%s@GOTPCREL(%%rip)\n", 
> mcount_name);
> + if (ASSEMBLER_DIALECT == ASM_INTEL)
> +   fprintf (file, "1:\tcall\t[QWORD PTR %s@GOTPCREL[rip]]",
> +mcount_name);
> + else
> +   fprintf (file, "1:\tcall\t*%s@GOTPCREL(%%rip)\n",
> +mcount_name);
>   break;
> }
>   /* fall through */
> @@ -22791,23 +22816,37 @@ x86_function_profiler (FILE *file, int l
>else if (flag_pic)
>  {
>  #ifndef NO_PROFILE_COUNTERS
> -  fprintf (file, "\tleal\t%sP%d@GOTOFF(%%ebx),%%" PROFILE_COUNT_REGISTER 
> "\n",
> -  LPREFIX, labelno);
> +  if (ASSEMBLER_DIALECT == ASM_INTEL)
> +   fprintf (file,
> +"\tlea\t" PROFILE_COUNT_REGISTER ", %sP%d@GOTOFF[ebx]\n",
> +LPREFIX, labelno);
> +  else
> +   fprintf (file,
> +"\tleal\t%sP%d@GOTOFF(%%ebx), %%" PROFILE_COUNT_REGISTER 
> "\n",
> +LPREFIX, labelno);
>  #endif
> -  fprintf (file, "1:\tcall\t*%s@GOT(%%ebx)\n", mcount_name);
> +  if 

Re: [PATCH] i386: Add "Ws" constraint for symbolic address/label reference [PR105576]

2024-01-16 Thread Uros Bizjak
On Thu, Jan 11, 2024 at 7:24 PM Fangrui Song  wrote:
>
> Printing the raw symbol is useful in inline asm (e.g. in C++ to get the
> mangled name).  Similar constraints are available in other targets (e.g.
> "S" for aarch64/riscv, "Cs" for m68k).
>
> There isn't a good way for x86 yet, e.g. "i" doesn't work for
> PIC/-mcmodel=large.  This patch adds "Ws".  Here are possible use cases:
>
> ```
> namespace ns { extern int var; }
> asm (".pushsection .xxx,\"aw\"; .dc.a %0; .popsection" :: "Ws"());
> asm (".reloc ., BFD_RELOC_NONE, %0" :: "Ws"());
> ```
>
> gcc/ChangeLog:
>
> PR target/105576
> * config/i386/constraints.md: Define constraint "Ws".
> * doc/md.texi: Document it.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/asm-raw-symbol.c: New testcase.

OK.

Thanks,
Uros.

>
> ---
>
> This obsoletes 
> https://gcc.gnu.org/pipermail/gcc-patches/2024-January/642580.html
> I initially tried 'z', but Uros requested that a W prefix is used.
> ---
>  gcc/config/i386/constraints.md |  4 
>  gcc/doc/md.texi|  4 
>  gcc/testsuite/gcc.target/i386/asm-raw-symbol.c | 13 +
>  3 files changed, 21 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/i386/asm-raw-symbol.c
>
> diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
> index 0c6e662df25..280e4c8e36c 100644
> --- a/gcc/config/i386/constraints.md
> +++ b/gcc/config/i386/constraints.md
> @@ -348,6 +348,10 @@ (define_constraint "Wf"
> to double word size."
>(match_operand 0 "x86_64_dwzext_immediate_operand"))
>
> +(define_constraint "Ws"
> +  "A symbolic reference or label reference."
> +  (match_code "const,symbol_ref,label_ref"))
> +
>  (define_constraint "Z"
>"32-bit unsigned integer constant, or a symbolic reference known
> to fit that range (for immediate operands in zero-extending x86-64
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 47a87d6ceec..b0c61925120 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -4275,6 +4275,10 @@ require non-@code{VOIDmode} immediate operands).
>  128-bit integer constant where both the high and low 64-bit word
>  satisfy the @code{e} constraint.
>
> +@item Ws
> +A symbolic reference or label reference.
> +You can use the @code{%p} modifier to print the raw symbol.
> +
>  @item Z
>  32-bit unsigned integer constant, or a symbolic reference known
>  to fit that range (for immediate operands in zero-extending x86-64
> diff --git a/gcc/testsuite/gcc.target/i386/asm-raw-symbol.c 
> b/gcc/testsuite/gcc.target/i386/asm-raw-symbol.c
> new file mode 100644
> index 000..b7854567dd9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/asm-raw-symbol.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +
> +extern int var;
> +
> +void
> +func (void)
> +{
> +  __asm__ ("@ %p0" : : "Ws" (func));
> +  __asm__ ("@ %p0" : : "Ws" ( + 1));
> +}
> +
> +/* { dg-final { scan-assembler "@ func" } } */
> +/* { dg-final { scan-assembler "@ var\\+4" } } */
> --
> 2.43.0.275.g3460e3d667-goog
>


Re: [PATCH] i386: Add "z" constraint for symbolic address/label reference [PR105576]

2024-01-11 Thread Uros Bizjak
On Thu, Jan 11, 2024 at 9:33 AM Fangrui Song  wrote:
>
> On 2024-01-11, Uros Bizjak wrote:
> >On Thu, Jan 11, 2024 at 4:44 AM Fangrui Song  wrote:
> >>
> >> Printing the raw symbol is useful in inline asm (e.g. in C++ to get the
> >> mangled name).  Similar constraints are available in other targets (e.g.
> >> "S" for aarch64/riscv, "Cs" for m68k).
> >>
> >> There isn't a good way for x86 yet, e.g. "i" doesn't work for
> >> PIC/-mcmodel=large.  This patch adds "z".
> >
> >Please use W-prefixed multi-letter constraint name.
> >
> >Uros.
>
> Sounds good.  How about "Ws"?

Yes, LGTM.

Thanks,
Uros.

>
> (I've asked overse...@gcc.gnu.org whether I can get gcc access (I
> already have binutils-gdb/glibc access), so that I can land approved
> patches myself in the future)
>
>
>  From ad7bf3dce026bf226e22ab709c9326c611a4b745 Mon Sep 17 00:00:00 2001
> From: Fangrui Song 
> Date: Wed, 10 Jan 2024 18:49:45 -0800
> Subject: [PATCH] i386: Add "Ws" constraint for symbolic address/label
>   reference [PR105576]
>
> Printing the raw symbol is useful in inline asm (e.g. in C++ to get the
> mangled name).  Similar constraints are available in other targets (e.g.
> "S" for aarch64/riscv, "Cs" for m68k).
>
> There isn't a good way for x86 yet, e.g. "i" doesn't work for
> PIC/-mcmodel=large.  This patch adds "Ws".  Here are possible use cases:
>
> ```
> namespace ns { extern int var; }
> asm (".pushsection .xxx,\"aw\"; .dc.a %0; .popsection" :: "S"());
> asm (".reloc ., BFD_RELOC_NONE, %0" :: "S"());
> ```
>
> gcc/ChangeLog:
>
>  PR target/105576
>  * config/i386/constraints.md: Define constraint "Ws".
>  * doc/md.texi: Document it.
>
> gcc/testsuite/ChangeLog:
>
>  * gcc.target/i386/asm-raw-symbol.c: New testcase.
> ---
>   gcc/config/i386/constraints.md |  4 
>   gcc/doc/md.texi|  4 
>   gcc/testsuite/gcc.target/i386/asm-raw-symbol.c | 13 +
>   3 files changed, 21 insertions(+)
>   create mode 100644 gcc/testsuite/gcc.target/i386/asm-raw-symbol.c
>
> diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
> index 0c6e662df25..280e4c8e36c 100644
> --- a/gcc/config/i386/constraints.md
> +++ b/gcc/config/i386/constraints.md
> @@ -348,6 +348,10 @@ (define_constraint "Wf"
>  to double word size."
> (match_operand 0 "x86_64_dwzext_immediate_operand"))
>
> +(define_constraint "Ws"
> +  "A symbolic reference or label reference."
> +  (match_code "const,symbol_ref,label_ref"))
> +
>   (define_constraint "Z"
> "32-bit unsigned integer constant, or a symbolic reference known
>  to fit that range (for immediate operands in zero-extending x86-64
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 47a87d6ceec..b0c61925120 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -4275,6 +4275,10 @@ require non-@code{VOIDmode} immediate operands).
>   128-bit integer constant where both the high and low 64-bit word
>   satisfy the @code{e} constraint.
>
> +@item Ws
> +A symbolic reference or label reference.
> +You can use the @code{%p} modifier to print the raw symbol.
> +
>   @item Z
>   32-bit unsigned integer constant, or a symbolic reference known
>   to fit that range (for immediate operands in zero-extending x86-64
> diff --git a/gcc/testsuite/gcc.target/i386/asm-raw-symbol.c 
> b/gcc/testsuite/gcc.target/i386/asm-raw-symbol.c
> new file mode 100644
> index 000..b7854567dd9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/asm-raw-symbol.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +
> +extern int var;
> +
> +void
> +func (void)
> +{
> +  __asm__ ("@ %p0" : : "Ws" (func));
> +  __asm__ ("@ %p0" : : "Ws" ( + 1));
> +}
> +
> +/* { dg-final { scan-assembler "@ func" } } */
> +/* { dg-final { scan-assembler "@ var\\+4" } } */
> --
> 2.43.0.275.g3460e3d667-goog
>


Re: [PATCH] i386: Add "z" constraint for symbolic address/label reference [PR105576]

2024-01-10 Thread Uros Bizjak
On Thu, Jan 11, 2024 at 4:44 AM Fangrui Song  wrote:
>
> Printing the raw symbol is useful in inline asm (e.g. in C++ to get the
> mangled name).  Similar constraints are available in other targets (e.g.
> "S" for aarch64/riscv, "Cs" for m68k).
>
> There isn't a good way for x86 yet, e.g. "i" doesn't work for
> PIC/-mcmodel=large.  This patch adds "z".

Please use W-prefixed multi-letter constraint name.

Uros.

>
> gcc/ChangeLog:
>
> PR target/105576
> * config/i386/constraints.md: Define constraint 'z'.
> * doc/md.texi: Document it.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/asm-raw-symbol.c: New testcase.
> ---
>  gcc/config/i386/constraints.md |  5 -
>  gcc/doc/md.texi|  4 
>  gcc/testsuite/gcc.target/i386/asm-raw-symbol.c | 13 +
>  3 files changed, 21 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/asm-raw-symbol.c
>
> diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
> index 0c6e662df25..64330dfdf01 100644
> --- a/gcc/config/i386/constraints.md
> +++ b/gcc/config/i386/constraints.md
> @@ -19,7 +19,6 @@
>
>  ;;; Unused letters:
>  ;;;   H
> -;;; z
>
>  ;; Integer register constraints.
>  ;; It is not necessary to define 'r' here.
> @@ -438,3 +437,7 @@ (define_constraint  "je"
>"@internal constant that do not allow any unspec global offsets"
>(and (match_operand 0 "x86_64_immediate_operand")
> (match_test "!x86_poff_operand_p (op)")))
> +
> +(define_constraint "z"
> +  "A symbolic reference or label reference."
> +  (match_code "const,symbol_ref,label_ref"))
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 47a87d6ceec..bbfec024311 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -4286,6 +4286,10 @@ VSIB address operand.
>  @item Ts
>  Address operand without segment register.
>
> +@item z
> +A symbolic reference or label reference.
> +You can use the @code{%p} modifier to print the raw symbol.
> +
>  @end table
>
>  @item Xstormy16---@file{config/stormy16/stormy16.h}
> diff --git a/gcc/testsuite/gcc.target/i386/asm-raw-symbol.c 
> b/gcc/testsuite/gcc.target/i386/asm-raw-symbol.c
> new file mode 100644
> index 000..ce88f3baee6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/asm-raw-symbol.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +
> +extern int var;
> +
> +void
> +func (void)
> +{
> +  __asm__ ("@ %p0" : : "z" (func));
> +  __asm__ ("@ %p0" : : "z" ( + 1));
> +}
> +
> +/* { dg-final { scan-assembler "@ func" } } */
> +/* { dg-final { scan-assembler "@ var\\+4" } } */
> --
> 2.43.0.275.g3460e3d667-goog
>


  1   2   3   4   5   6   7   8   9   10   >