from:"Carl Love via Gcc\-patches"

Re: [PATCH ver 4] rs6000, add overloaded DFP quantize support

2023-08-29 Thread Carl Love via Gcc-patches

Kewen:

On Tue, 2023-08-29 at 16:54 +0800, Kewen.Lin wrote:
> >   The following functions require @option{-mhard-float},
> > diff --git a/gcc/testsuite/gcc.target/powerpc/pr93448.c
> > b/gcc/testsuite/gcc.target/powerpc/pr93448.c
> > new file mode 100644
> > index 000..f9c388585d7
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/pr93448.c
> > @@ -0,0 +1,200 @@
> > +/* { dg-do run } */
> > +/* { dg-require-effective-target  dfp_hw} */
> > +/* { dg-require-effective-target  has_arch_pwr6} */
> 
> Sorry, I didn't catch this in the previous reviews.
> "dfp_hw" and "has_arch_pwr6" don't have the expected
> space after, without the space, the checkings would
> be useless and this case can fail.  So they should be:
> 
> /* { dg-require-effective-target dfp_hw } */
> /* { dg-require-effective-target has_arch_pwr6 } */
> 
> Okay for trunk with this fixed, thanks!

OK, I take it the parsing of the lines by the test scripts will fail
without the space since it can't parse it correctly.  Thanks for
letting me know.  Here is the fixed up code.  Note, I added the space
before the "}" and removed the extra space before dfp_hw and
has_arch_pwr6. 

get/powerpc/pr93448.c   
new file mode 100644
index 000..6b800f8d63d  
--- /dev/null   
+++ b/gcc/testsuite/gcc.target/powerpc/pr93448.c
@@ -0,0 +1,200 @@   
+/* { dg-do run } */
+/* { dg-require-effective-target dfp_hw } */   
+/* { dg-require-effective-target has_arch_pwr6 } */
+/* { dg-options "-mhard-float -O2 -save-temps" } */
+   
+/* Test the decimal floating point quantize built-ins.  */ 

I will go ahead and commit the patch.  Thanks for all your help.

  Carl

[PATCH ver 4] rs6000, add overloaded DFP quantize support

2023-08-28 Thread Carl Love via Gcc-patches



GCC maintainers:

Version 4, additional define_insn name fix.  Change Log fix for the
UNSPEC_DQUAN.  Retested patch on Power 10 LE.

Version 3, fixed the built-in instance names.  Missed removing the "n"
the name.  Added the tighter constraints on the predicates for the
define_insn.  Updated the wording for the built-ins in the
documentation file.  Changed the test file name again.  Updated the
ChangeLog file, added the PR target line.  Retested the patch on Power
10LE and Power 8 and Power 9.

Version 2, renamed the built-in instances.  Changed the name of the
overloaded built-in.  Added the missing documentation for the new
built-ins.  Fixed typos.  Changed name of the test.  Updated the
effective target for the test.  Retested the patch on Power 10LE and
Power 8 and Power 9.

The following patch adds four built-ins for the decimal floating point
(DFP) quantize instructions on rs6000.  The built-ins are for 64-bit
and 128-bit DFP operands.

The patch also adds a test case for the new builtins.

The Patch has been tested on Power 10LE and Power 9 LE/BE.

Please let me know if the patch is acceptable for mainline.  Thanks.

 Carl Love



rs6000, add overloaded DFP quantize support

Add decimal floating point (DFP) quantize built-ins for both 64-bit DFP
and 128-DFP operands.  In each case, there is an immediate version and a
variable version of the built-in.  The RM value is a 2-bit constant int
which specifies the rounding mode to use.  For the immediate versions of
the built-in, the TE field is a 5-bit constant that specifies the value of
the ideal exponent for the result.  The built-in specifications are:

  __Decimal64 builtin_dfp_quantize (_Decimal64, _Decimal64,
const int RM)
  __Decimal64 builtin_dfp_quantize (const int TE, _Decimal64,
const int RM)
  __Decimal128 builtin_dfp_quantize (_Decimal128, _Decimal128,
 const int RM)
  __Decimal128 builtin_dfp_quantize (const int TE, _Decimal128,
 const int RM)

A testcase is added for the new built-in definitions.

gcc/ChangeLog:
* config/rs6000/dfp.md (UNSPEC_DQUAN): New unspec.
(dfp_dqua_, dfp_dquai_): New define_insn.
* config/rs6000/rs6000-builtins.def (__builtin_dfp_dqua,
__builtin_dfp_dquai, __builtin_dfp_dquaq, __builtin_dfp_dquaqi):
New buit-in definitions.
* config/rs6000/rs6000-overload.def (__builtin_dfp_quantize): New
overloaded definition.
* doc/extend.texi: Add documentation for __builtin_dfp_quantize.

gcc/testsuite/
* gcc.target/powerpc/pr93448.c: New test case.

PR target/93448
---
 gcc/config/rs6000/dfp.md   |  25 ++-
 gcc/config/rs6000/rs6000-builtins.def  |  15 ++
 gcc/config/rs6000/rs6000-overload.def  |  10 ++
 gcc/doc/extend.texi|  17 ++
 gcc/testsuite/gcc.target/powerpc/pr93448.c | 200 +
 5 files changed, 266 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr93448.c

diff --git a/gcc/config/rs6000/dfp.md b/gcc/config/rs6000/dfp.md
index 5ed8a73ac51..bf4a227b0eb 100644
--- a/gcc/config/rs6000/dfp.md
+++ b/gcc/config/rs6000/dfp.md
@@ -271,7 +271,8 @@ (define_c_enum "unspec"
UNSPEC_DIEX
UNSPEC_DSCLI
UNSPEC_DTSTSFI
-   UNSPEC_DSCRI])
+   UNSPEC_DSCRI
+   UNSPEC_DQUAN])
 
 (define_code_iterator DFP_TEST [eq lt gt unordered])
 
@@ -395,3 +396,25 @@ (define_insn "dfp_dscri_"
   "dscri %0,%1,%2"
   [(set_attr "type" "dfp")
(set_attr "size" "")])
+
+(define_insn "dfp_dqua_"
+  [(set (match_operand:DDTD 0 "gpc_reg_operand" "=d")
+(unspec:DDTD [(match_operand:DDTD 1 "gpc_reg_operand" "d")
+ (match_operand:DDTD 2 "gpc_reg_operand" "d")
+ (match_operand:SI 3 "const_0_to_3_operand" "n")]
+ UNSPEC_DQUAN))]
+  "TARGET_DFP"
+  "dqua %0,%1,%2,%3"
+  [(set_attr "type" "dfp")
+   (set_attr "size" "")])
+
+(define_insn "dfp_dquai_"
+  [(set (match_operand:DDTD 0 "gpc_reg_operand" "=d")
+(unspec:DDTD [(match_operand:SI 1 "s5bit_cint_operand" "n")
+ (match_operand:DDTD 2 "gpc_reg_operand" "d")
+ (match_operand:SI 3 "const_0_to_3_operand" "n")]
+ UNSPEC_DQUAN))]
+  "TARGET_DFP"
+  "dquai %1,%0,%2,%3"
+  [(set_attr "type" "dfp")
+   (set_attr "size" "")])
diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 8a294d6c934..ce40600e803 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -2983,6 +2983,21 @@
   const unsigned long long __builtin_unpack_dec128 (_Decimal128, const int<1>);
 UNPACK_TD unpacktd {}
 
+  const _Decimal64 __builtin_dfp_dqua (_Decimal64, _Decimal64, \
+  const int<2>);
+

Re: [PATCH ver 3] rs6000, add overloaded DFP quantize support

2023-08-28 Thread Carl Love via Gcc-patches

On Mon, 2023-08-28 at 10:21 +0800, Kewen.Lin wrote:
> Hi Carl,



> > 
> > A testcase is added for the new built-in definitions.
> > 
> > gcc/ChangeLog:
> > * config/rs6000/dfp.md: New UNSPEC_DQUAN.
> 
> Nit: (UNSPEC_DQUAN): New unspec.

Fixed.

> 



> > +(define_insn "dfp_dqua_"
> > +  [(set (match_operand:DDTD 0 "gpc_reg_operand" "=d")
> > +(unspec:DDTD [(match_operand:DDTD 1 "gpc_reg_operand" "d")
> > + (match_operand:DDTD 2 "gpc_reg_operand" "d")
> > + (match_operand:SI 3 "const_0_to_3_operand" "n")]
> > + UNSPEC_DQUAN))]
> > +  "TARGET_DFP"
> > +  "dqua %0,%1,%2,%3"
> > +  [(set_attr "type" "dfp")
> > +   (set_attr "size" "")])
> > +
> > +(define_insn "dfp_dqua_i"
> 
> Sorry for nitpicking, but what I suggested previously was
> "dfp_dquai_"
> instead of "dfp_dqua_i", "dquai" matches the according mnemonic so
> it's
> read better, i expands to "idd" and "itd" that look odd to me.
> Do you agree "dquai" is better?  If yes, the changelog and the
> related
> expanders need to be updated as well.
> 
> The others look good to me, thanks!

We need to get it right, so don't be sorry for nitpicking.  My bad for
not getting it right the first time.

Fixed.


Carl

[PATCH ver 3] rs6000, add overloaded DFP quantize support

2023-08-24 Thread Carl Love via Gcc-patches

GCC maintainers:

Version 3, fixed the built-in instance names.  Missed removing the "n"
the name.  Added the tighter constraints on the predicates for the
define_insn.  Updated the wording for the built-ins in the
documentation file.  Changed the test file name again.  Updated the
ChangeLog file, added the PR target line.  Retested the patch on Power
10LE and Power 8 and Power 9.

Version 2, renamed the built-in instances.  Changed the name of the
overloaded built-in.  Added the missing documentation for the new
built-ins.  Fixed typos.  Changed name of the test.  Updated the
effective target for the test.  Retested the patch on Power 10LE and
Power 8 and Power 9.

The following patch adds four built-ins for the decimal floating point
(DFP) quantize instructions on rs6000.  The built-ins are for 64-bit
and 128-bit DFP operands.

The patch also adds a test case for the new builtins.

The Patch has been tested on Power 10LE and Power 9 LE/BE.

Please let me know if the patch is acceptable for mainline.  Thanks.

 Carl Love


---
rs6000, add overloaded DFP quantize support

Add decimal floating point (DFP) quantize built-ins for both 64-bit DFP
and 128-DFP operands.  In each case, there is an immediate version and a
variable version of the built-in.  The RM value is a 2-bit constant int
which specifies the rounding mode to use.  For the immediate versions of
the built-in, the TE field is a 5-bit constant that specifies the value of
the ideal exponent for the result.  The built-in specifications are:

  __Decimal64 builtin_dfp_quantize (_Decimal64, _Decimal64,
const int RM)
  __Decimal64 builtin_dfp_quantize (const int TE, _Decimal64,
const int RM)
  __Decimal128 builtin_dfp_quantize (_Decimal128, _Decimal128,
 const int RM)
  __Decimal128 builtin_dfp_quantize (const int TE, _Decimal128,
 const int RM)

A testcase is added for the new built-in definitions.

gcc/ChangeLog:
* config/rs6000/dfp.md: New UNSPEC_DQUAN.
(dfp_dqua_, dfp_dqua_i): New define_insn.
* config/rs6000/rs6000-builtins.def (__builtin_dfp_dqua,
__builtin_dfp_dquai, __builtin_dfp_dquaq, __builtin_dfp_dquaqi):
New buit-in definitions.
* config/rs6000/rs6000-overload.def (__builtin_dfp_quantize): New
overloaded definition.
* doc/extend.texi: Add documentation for __builtin_dfp_quantize.

gcc/testsuite/
* gcc.target/powerpc/pr93448.c: New test case.

PR target/93448
---
 gcc/config/rs6000/dfp.md   |  25 ++-
 gcc/config/rs6000/rs6000-builtins.def  |  15 ++
 gcc/config/rs6000/rs6000-overload.def  |  10 ++
 gcc/doc/extend.texi|  17 ++
 gcc/testsuite/gcc.target/powerpc/pr93448.c | 200 +
 5 files changed, 266 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr93448.c

diff --git a/gcc/config/rs6000/dfp.md b/gcc/config/rs6000/dfp.md
index 5ed8a73ac51..052dc0946d3 100644
--- a/gcc/config/rs6000/dfp.md
+++ b/gcc/config/rs6000/dfp.md
@@ -271,7 +271,8 @@ (define_c_enum "unspec"
UNSPEC_DIEX
UNSPEC_DSCLI
UNSPEC_DTSTSFI
-   UNSPEC_DSCRI])
+   UNSPEC_DSCRI
+   UNSPEC_DQUAN])
 
 (define_code_iterator DFP_TEST [eq lt gt unordered])
 
@@ -395,3 +396,25 @@ (define_insn "dfp_dscri_"
   "dscri %0,%1,%2"
   [(set_attr "type" "dfp")
(set_attr "size" "")])
+
+(define_insn "dfp_dqua_"
+  [(set (match_operand:DDTD 0 "gpc_reg_operand" "=d")
+(unspec:DDTD [(match_operand:DDTD 1 "gpc_reg_operand" "d")
+ (match_operand:DDTD 2 "gpc_reg_operand" "d")
+ (match_operand:SI 3 "const_0_to_3_operand" "n")]
+ UNSPEC_DQUAN))]
+  "TARGET_DFP"
+  "dqua %0,%1,%2,%3"
+  [(set_attr "type" "dfp")
+   (set_attr "size" "")])
+
+(define_insn "dfp_dqua_i"
+  [(set (match_operand:DDTD 0 "gpc_reg_operand" "=d")
+(unspec:DDTD [(match_operand:SI 1 "s5bit_cint_operand" "n")
+ (match_operand:DDTD 2 "gpc_reg_operand" "d")
+ (match_operand:SI 3 "const_0_to_3_operand" "n")]
+ UNSPEC_DQUAN))]
+  "TARGET_DFP"
+  "dquai %1,%0,%2,%3"
+  [(set_attr "type" "dfp")
+   (set_attr "size" "")])
diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 8a294d6c934..81a0de88b9c 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -2983,6 +2983,21 @@
   const unsigned long long __builtin_unpack_dec128 (_Decimal128, const int<1>);
 UNPACK_TD unpacktd {}
 
+  const _Decimal64 __builtin_dfp_dqua (_Decimal64, _Decimal64, \
+  const int<2>);
+DFPQUAN_64 dfp_dqua_dd {}
+
+  const _Decimal64 __builtin_dfp_dquai (const int<5>, _Decimal64, \
+

Re: [PATCH ver 2] rs6000, add overloaded DFP quantize support

2023-08-24 Thread Carl Love via Gcc-patches

Kewen, Peter:

> on 2023/8/17 08:19, Carl Love wrote:
> > GCC maintainers:
> > 
> > Version 2, renamed the built-in instances.  Changed the name of the
> > overloaded built-in.  Added the missing documentation for the new
> > built-ins.  Fixed typos.  Changed name of the test.  Updated the
> > effective target for the test.  Retested the patch on Power 10LE
> > and
> > Power 8 and Power 9.
> > 
> > The following patch adds four built-ins for the decimal floating
> point
> > (DFP) quantize instructions on rs6000.  The built-ins are for 64-
> > bit
> > and 128-bit DFP operands.
> > 
> > The patch also adds a test case for the new builtins.
> > 
> > The Patch has been tested on Power 10LE and Power 9 LE/BE.
> > 
> > Please let me know if the patch is acceptable for
> > mainline.  Thanks.
> > 
> >  Carl Love
> > 
> > 
> > 
> > --
> > [PATCH] rs6000, add overloaded DFP quantize support
> > 
> > Add decimal floating point (DFP) quantize built-ins for both 64-bit
> DFP
> > and 128-DFP operands.  In each case, there is an immediate version
> and a
> > variable version of the built-in.  The RM value is a 2-bit constant
> int
> > which specifies the rounding mode to use.  For the immediate
> > versions
> of
> > the built-in, the TE field is a 5-bit constant that specifies the
> value of
> > the ideal exponent for the result.  The built-in specifications
> > are:
> > 
> >   __Decimal64 builtin_dfp_quantize (_Decimal64, _Decimal64,
> > const int RM)
> >   __Decimal64 builtin_dfp_quantize (const int TE, _Decimal64,
> > const int)
> >   __Decimal128 builtin_dfp_quantize (_Decimal128, _Decimal128,
> >  const int RM)
> >   __Decimal128 builtin_dfp_quantize (const int TE, _Decimal128,
> >  const int)
> 
> Nit: Add the parameter name "RM" for all instances, otherwise the
> readers
> might feel confused what do the other two without RM mean. :)

Yes, they all should have the parameter name RM.  Fixed.

> 
> > A testcase is added for the new built-in definitions.
> 
> Nit: A PR marker line like:
> 
>   PR target/93448
> 
> > gcc/ChangeLog:
> > * config/rs6000/dfp.md: New UNSPECDQUAN.
> > (dfp_quan_, dfp_quan_i): New define_insn.
> > * config/rs6000/rs6000-builtins.def (__builtin_dfp_quantize_64,
> > __builtin_dfp_quantize_64i, __builtin_dfp_quantize_128,
> > __builtin_dfp_quantize_128i): New buit-in definitions.
> > * config/rs6000/rs6000-overload.def (__builtin_dfp_quantize,
> > __builtin_dfpq_quantize): New overloaded definitions.
> 
> These entries need updates with this new revision, also miss one
> entry
Fixed with the new names, added the documentation entry.

> for documentation update.
> 
> > gcc/testsuite/
> >  * gcc.target/powerpc/builtin-dfp-quantize-runnable.c: New test
> > case.
> 
> Ditto, inconsistent name.

Fixed with the new name of the file, pr93448.c.

> 
> > ---
> >  gcc/config/rs6000/dfp.md  |  25 ++-
> >  gcc/config/rs6000/rs6000-builtins.def |  15 ++
> >  gcc/config/rs6000/rs6000-overload.def |  10 +
> >  gcc/doc/extend.texi   |  15 ++
> >  .../gcc.target/powerpc/pr93448-dfp-quantize.c | 199
> ++
> >  5 files changed, 263 insertions(+), 1 deletion(-)
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr93448-dfp-
> quantize.c
> > diff --git a/gcc/config/rs6000/dfp.md b/gcc/config/rs6000/dfp.md
> > index 5ed8a73ac51..abd21c5db75 100644
> > --- a/gcc/config/rs6000/dfp.md
> > +++ b/gcc/config/rs6000/dfp.md
> > @@ -271,7 +271,8 @@
> > UNSPEC_DIEX
> > UNSPEC_DSCLI
> > UNSPEC_DTSTSFI
> > -   UNSPEC_DSCRI])
> > +   UNSPEC_DSCRI
> > +   UNSPEC_DQUAN])
> >  
> >  (define_code_iterator DFP_TEST [eq lt gt unordered])
> >  
> > @@ -395,3 +396,25 @@
> >"dscri %0,%1,%2"
> >[(set_attr "type" "dfp")
> > (set_attr "size" "")])
> > +
> > +(define_insn "dfp_dquan_"
> 
> I guess I mentioned this previously, I prefer "dfp_dqua_"
> which aligns with the most others ...

Yes, I missed that I had the extra "n" and didn't fix that part of the
name.  Sorry about that.  Updated both define_insn definitions.

> 
> > +  [(set (match_operand:DDTD 0 "gpc_reg_operand" "=d")
> > +(unspec:DDTD [(match_operand:DDTD 1 "gpc_reg_operand" "d")
> > + (match_operand:DDTD 2 "gpc_reg_operand" "d")
> > + (match_operand:QI 3 "immediate_operand" "i")]
> > + UNSPEC_DQUAN))]
> > +  "TARGET_DFP"
> > +  "dqua %0,%1,%2,%3"
> > +  [(set_attr "type" "dfp")
> > +   (set_attr "size" "")])
> > +
> > +(define_insn "dfp_dquan_i"
> 
> ..., also prefer "dfp_dquai_" here.

Ditto on the name change fix.

> 
> Please also incorporate Peter's insightful comments on predicates
> and constraints on this part.

OK, changed to the stricter predicate constraints.

> 
> > +  [(set

[PATCH ver 2] rs6000, add overloaded DFP quantize support

2023-08-16 Thread Carl Love via Gcc-patches



GCC maintainers:

Version 2, renamed the built-in instances.  Changed the name of the
overloaded built-in.  Added the missing documentation for the new
built-ins.  Fixed typos.  Changed name of the test.  Updated the
effective target for the test.  Retested the patch on Power 10LE and
Power 8 and Power 9.

The following patch adds four built-ins for the decimal floating point
(DFP) quantize instructions on rs6000.  The built-ins are for 64-bit
and 128-bit DFP operands.

The patch also adds a test case for the new builtins.

The Patch has been tested on Power 10LE and Power 9 LE/BE.

Please let me know if the patch is acceptable for mainline.  Thanks.

 Carl Love



--
[PATCH] rs6000, add overloaded DFP quantize support

Add decimal floating point (DFP) quantize built-ins for both 64-bit DFP
and 128-DFP operands.  In each case, there is an immediate version and a
variable version of the built-in.  The RM value is a 2-bit constant int
which specifies the rounding mode to use.  For the immediate versions of
the built-in, the TE field is a 5-bit constant that specifies the value of
the ideal exponent for the result.  The built-in specifications are:

  __Decimal64 builtin_dfp_quantize (_Decimal64, _Decimal64,
const int RM)
  __Decimal64 builtin_dfp_quantize (const int TE, _Decimal64,
const int)
  __Decimal128 builtin_dfp_quantize (_Decimal128, _Decimal128,
 const int RM)
  __Decimal128 builtin_dfp_quantize (const int TE, _Decimal128,
 const int)

A testcase is added for the new built-in definitions.

gcc/ChangeLog:
* config/rs6000/dfp.md: New UNSPECDQUAN.
(dfp_quan_, dfp_quan_i): New define_insn.
* config/rs6000/rs6000-builtins.def (__builtin_dfp_quantize_64,
__builtin_dfp_quantize_64i, __builtin_dfp_quantize_128,
__builtin_dfp_quantize_128i): New buit-in definitions.
* config/rs6000/rs6000-overload.def (__builtin_dfp_quantize,
__builtin_dfpq_quantize): New overloaded definitions.

gcc/testsuite/
 * gcc.target/powerpc/builtin-dfp-quantize-runnable.c: New test
case.
---
 gcc/config/rs6000/dfp.md  |  25 ++-
 gcc/config/rs6000/rs6000-builtins.def |  15 ++
 gcc/config/rs6000/rs6000-overload.def |  10 +
 gcc/doc/extend.texi   |  15 ++
 .../gcc.target/powerpc/pr93448-dfp-quantize.c | 199 ++
 5 files changed, 263 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr93448-dfp-quantize.c

diff --git a/gcc/config/rs6000/dfp.md b/gcc/config/rs6000/dfp.md
index 5ed8a73ac51..abd21c5db75 100644
--- a/gcc/config/rs6000/dfp.md
+++ b/gcc/config/rs6000/dfp.md
@@ -271,7 +271,8 @@
UNSPEC_DIEX
UNSPEC_DSCLI
UNSPEC_DTSTSFI
-   UNSPEC_DSCRI])
+   UNSPEC_DSCRI
+   UNSPEC_DQUAN])
 
 (define_code_iterator DFP_TEST [eq lt gt unordered])
 
@@ -395,3 +396,25 @@
   "dscri %0,%1,%2"
   [(set_attr "type" "dfp")
(set_attr "size" "")])
+
+(define_insn "dfp_dquan_"
+  [(set (match_operand:DDTD 0 "gpc_reg_operand" "=d")
+(unspec:DDTD [(match_operand:DDTD 1 "gpc_reg_operand" "d")
+ (match_operand:DDTD 2 "gpc_reg_operand" "d")
+ (match_operand:QI 3 "immediate_operand" "i")]
+ UNSPEC_DQUAN))]
+  "TARGET_DFP"
+  "dqua %0,%1,%2,%3"
+  [(set_attr "type" "dfp")
+   (set_attr "size" "")])
+
+(define_insn "dfp_dquan_i"
+  [(set (match_operand:DDTD 0 "gpc_reg_operand" "=d")
+(unspec:DDTD [(match_operand:SI 1 "const_int_operand" "n")
+ (match_operand:DDTD 2 "gpc_reg_operand" "d")
+ (match_operand:SI 3 "immediate_operand" "i")]
+ UNSPEC_DQUAN))]
+  "TARGET_DFP"
+  "dquai %1,%0,%2,%3"
+  [(set_attr "type" "dfp")
+   (set_attr "size" "")])
diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 8a294d6c934..a7ab90771f9 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -2983,6 +2983,21 @@
   const unsigned long long __builtin_unpack_dec128 (_Decimal128, const int<1>);
 UNPACK_TD unpacktd {}
 
+  const _Decimal64 __builtin_dfp_dqua (_Decimal64, _Decimal64, \
+  const int<2>);
+DFPQUAN_64 dfp_dquan_dd {}
+
+  const _Decimal64 __builtin_dfp_dquai (const int<5>, _Decimal64, \
+   const int<2>);
+DFPQUAN_64i dfp_dquan_idd {}
+
+  const _Decimal128 __builtin_dfp_dquaq (_Decimal128, _Decimal128, \
+const int<2>);
+DFPQUAN_128 dfp_dquan_td {}
+
+  const _Decimal128 __builtin_dfp_dquaqi (const int<5>, _Decimal128, \
+ const int<2>);
+DFPQUAN_128i dfp_dquan_itd {}
 
 [crypto]

[PATCH] rs6000, add overloaded DFP quantize support

2023-08-09 Thread Carl Love via Gcc-patches



GCC maintainers:

The following patch adds four built-ins for the decimal floating point
(DFP) quantize instructions on rs6000.  The built-ins are for 64-bit
and 128-bit DFP operands.

The patch also adds a test case for the new builtins.

The Patch has been tested on Power 10LE and Power 9 LE/BE.

Please let me know if the patch is acceptable for mainline.  Thanks.

 Carl Love


--
rs6000, add overloaded DFP quantize support

Add decimal floating point (DFP) quantize built-ins for both 64-bit DFP
and 128-DFP operands.  In each case, there is an immediate version and a
variable version of the bult-in.  The RM value is a 2-bit const int which
specifies the rounding mode to use.  For the immediate versions of the
built-in, TE field is a 5-bit constant that specifies the value of the
ideal exponent for the result.  The built-in specifications are:

  __Decimal64 builtin_dfp_quantize (_Decimal64, _Decimal64,
const int RM)
  __Decimal64 builtin_dfp_quantize (const int TE, _Decimal64,
const int)
  __Decimal128 builtin_dfpq_quantize (_Decimal128, _Decimal128,
  const int RM)
  __Decimal128 builtin_dfpq_quantize (const int TE, _Decimal128,
  const int)

A testcase is added for the new built-in definitions.

gcc/ChangeLog:
* config/rs6000/dfp.md: New UNSPECDQUAN.
(dfp_quan_, dfp_quan_i): New define_insn.
* config/rs6000/rs6000-builtins.def (__builtin_dfp_quantize_64,
__builtin_dfp_quantize_64i, __builtin_dfp_quantize_128,
__builtin_dfp_quantize_128i): New buit-in definitions.
* config/rs6000/rs6000-overload.def (__builtin_dfp_quantize,
__builtin_dfpq_quantize): New overloaded definitions.

gcc/testsuite/
 * gcc.target/powerpc/builtin-dfp-quantize-runnable.c: New test
case.
---
 gcc/config/rs6000/dfp.md  |  25 ++-
 gcc/config/rs6000/rs6000-builtins.def |  15 ++
 gcc/config/rs6000/rs6000-overload.def |  12 ++
 .../powerpc/builtin-dfp-quantize-runnable.c   | 198 ++
 4 files changed, 249 insertions(+), 1 deletion(-)
 create mode 100644 
gcc/testsuite/gcc.target/powerpc/builtin-dfp-quantize-runnable.c

diff --git a/gcc/config/rs6000/dfp.md b/gcc/config/rs6000/dfp.md
index 5ed8a73ac51..254c22a5c20 100644
--- a/gcc/config/rs6000/dfp.md
+++ b/gcc/config/rs6000/dfp.md
@@ -271,7 +271,8 @@
UNSPEC_DIEX
UNSPEC_DSCLI
UNSPEC_DTSTSFI
-   UNSPEC_DSCRI])
+   UNSPEC_DSCRI
+   UNSPEC_DQUAN])
 
 (define_code_iterator DFP_TEST [eq lt gt unordered])
 
@@ -395,3 +396,25 @@
   "dscri %0,%1,%2"
   [(set_attr "type" "dfp")
(set_attr "size" "")])
+
+(define_insn "dfp_quan_"
+  [(set (match_operand:DDTD 0 "gpc_reg_operand" "=d")
+(unspec:DDTD [(match_operand:DDTD 1 "gpc_reg_operand" "d")
+ (match_operand:DDTD 2 "gpc_reg_operand" "d")
+  (match_operand:QI 3 "immediate_operand" "i")]
+ UNSPEC_DQUAN))]
+  "TARGET_DFP"
+  "dqua %0,%1,%2,%3"
+  [(set_attr "type" "dfp")
+   (set_attr "size" "")])
+
+(define_insn "dfp_quan_i"
+  [(set (match_operand:DDTD 0 "gpc_reg_operand" "=d")
+(unspec:DDTD [(match_operand:SI 1 "const_int_operand" "n")
+ (match_operand:DDTD 2 "gpc_reg_operand" "d")
+  (match_operand:SI 3 "immediate_operand" "i")]
+ UNSPEC_DQUAN))]
+  "TARGET_DFP"
+  "dquai %1,%0,%2,%3"
+  [(set_attr "type" "dfp")
+   (set_attr "size" "")])
diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 35c4cdf74c5..36a56311643 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -2983,6 +2983,21 @@
   const unsigned long long __builtin_unpack_dec128 (_Decimal128, const int<1>);
 UNPACK_TD unpacktd {}
 
+  const _Decimal64 __builtin_dfp_quantize_64 (_Decimal64, _Decimal64, \
+ const int<2>);
+DFPQUAN_64 dfp_quan_dd {}
+
+  const _Decimal64 __builtin_dfp_quantize_64i (const int<5>, _Decimal64, \
+const int<2>);
+DFPQUAN_64i dfp_quan_idd {}
+
+  const _Decimal128 __builtin_dfp_quantize_128 (_Decimal128, _Decimal128, \
+ const int<2>);
+DFPQUAN_128 dfp_quan_td {}
+
+  const _Decimal128 __builtin_dfp_quantize_128i (const int<5>, _Decimal128, \
+  const int<2>);
+DFPQUAN_128i dfp_quan_itd {}
 
 [crypto]
   const vull __builtin_crypto_vcipher (vull, vull);
diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def
index b83946f5ad8..3bb1bedd69d 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -195,6 +195,18 @@
   unsigned

Re: [PATCH ver 3] rs6000: Fix __builtin_altivec_vcmpne{b,h,w} implementation

2023-08-09 Thread Carl Love via Gcc-patches

Kewen:

On Wed, 2023-08-09 at 16:47 +0800, Kewen.Lin wrote:


> > Patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power 10
> > LE
> > with no regressions.
> 
> Okay for trunk with two nits below fixed, thanks!

Thanks for all the help with the patch.  Fixed the nits below, compiled
and reran the test cases to make sure everything was OK.  Will go ahead
and commit the patch.
> 
> > gcc/ChangeLog:
> > 
> > * config/rs6000/rs6000-builtins.def (vcmpneb, vcmpneh,
> > vcmpnew):
> > Move definitions to Altivec stanza.
> > * config/rs6000/altivec.md (vcmpneb, vcmpneh, vcmpnew): New
> > define_expand.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * gcc.target/powerpc/vec-cmpne-runnable.c: New execution test.
> > * gcc.target/powerpc/vec-cmpne.c (define_test_functions,
> > execute_test_functions) moved to vec-cmpne.h.  Added
> > scan-assembler-times for vcmpequb, vcmpequh, vcmpequw.
> 
>   s/ moved/: Move/ => "... execute_test_functions): Move "
>   
> s/Added/Add/

Fixed both issues.

> 



> >  
> > +;; Expand for builtin vcmpne{b,h,w}
> > +(define_expand "altivec_vcmpne_"
> > +  [(set (match_operand:VSX_EXTRACT_I 3 "altivec_register_operand"
> > "=v")
> > +   (eq:VSX_EXTRACT_I (match_operand:VSX_EXTRACT_I 1
> > "altivec_register_operand" "v")
> > + (match_operand:VSX_EXTRACT_I 2
> > "altivec_register_operand" "v")))
> > +   (set (match_operand:VSX_EXTRACT_I 0 "altivec_register_operand"
> > "=v")
> > +(not:VSX_EXTRACT_I (match_dup 3)))]
> > +  "TARGET_ALTIVEC"
> > +  {
> > +operands[3] = gen_reg_rtx (GET_MODE (operands[0]));
> > +  });
> 
> Nit: Useless ";".

removed semicolon.

   Carl

[PATCH ver 3] rs6000: Fix __builtin_altivec_vcmpne{b,h,w} implementation

2023-08-07 Thread Carl Love via Gcc-patches



GCC maintainers:

Ver 3: Updated description to make it clear the patch fixes the
confusion on the availability of the builtins.  Fixed the dg-require-
effective-target on the test cases and the dg-options.  Change the test
case so the for loop for the test will not be unrolled.  Fixed a
spelling error in a vec-cmpne.c comment.  Retested on Power 10LE.

Ver 2:  Re-worked the test vec-cmpne.c to create a compile only test
verify the instruction generation and a runnable test to verify the
built-in functionality.  Retested the patch on Power 8 LE/BE, Power
9LE/BE and Power 10 LE with no regressions.

The following patch cleans up the definition for the
__builtin_altivec_vcmpne{b,h,w}.  The current implementation implies
that the built-in is only supported on Power 9 since it is defined
under the Power 9 stanza.  However the built-in has no ISA restrictions
as stated in the Power Vector Intrinsic Programming Reference document.
The current built-in works because the built-in gets replaced during
GIMPLE folding by a simple not-equal operator so it doesn't get
expanded and checked for Power 9 code generation.

This patch moves the definition to the Altivec stanza in the built-in
definition file to make it clear the built-ins are valid for Power 8,
Power 9 and beyond.  

The patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power 10
LE with no regressions.

Please let me know if the patch is acceptable for mainline.  Thanks.

  Carl 



rs6000: Fix __builtin_altivec_vcmpne{b,h,w} implementation

The current built-in definitions for vcmpneb, vcmpneh, vcmpnew are defined
under the Power 9 section of r66000-builtins.  This implies they are only
supported on Power 9 and above when in fact they are defined and work with
Altivec as well with the appropriate Altivec instruction generation.

The vec_cmpne builtin should generate the vcmpequ{b,h,w} instruction with
Altivec enabled and generate the vcmpne{b,h,w} on Power 9 and newer
processors.

This patch moves the definitions to the Altivec stanza to make it clear
the built-ins are supported for all Altivec processors.  The patch
removes the confusion as to which processors support the vcmpequ{b,h,w}
instructions.

There is existing test coverage for the vec_cmpne built-in for
vector bool char, vector bool short, vector bool int,
vector bool long long in builtins-3-p9.c and p8vector-builtin-2.c.
Coverage for vector signed int, vector unsigned int is in
p8vector-builtin-2.c.

Test vec-cmpne.c is updated to check the generation of the vcmpequ{b,h,w}
instructions for Altivec.  A new test vec-cmpne-runnable.c is added to
verify the built-ins work as expected.

Patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power 10 LE
with no regressions.

gcc/ChangeLog:

* config/rs6000/rs6000-builtins.def (vcmpneb, vcmpneh, vcmpnew):
Move definitions to Altivec stanza.
* config/rs6000/altivec.md (vcmpneb, vcmpneh, vcmpnew): New
define_expand.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/vec-cmpne-runnable.c: New execution test.
* gcc.target/powerpc/vec-cmpne.c (define_test_functions,
execute_test_functions) moved to vec-cmpne.h.  Added
scan-assembler-times for vcmpequb, vcmpequh, vcmpequw.
* gcc.target/powerpc/vec-cmpne.h: New include file for vec-cmpne.c
and vec-cmpne-runnable.c. Split define_test_functions definition
into define_test_functions and define_init_verify_functions.
---
 gcc/config/rs6000/altivec.md  |  12 ++
 gcc/config/rs6000/rs6000-builtins.def |  18 +--
 .../gcc.target/powerpc/vec-cmpne-runnable.c   |  36 ++
 gcc/testsuite/gcc.target/powerpc/vec-cmpne.c  | 112 ++
 gcc/testsuite/gcc.target/powerpc/vec-cmpne.h  |  90 ++
 5 files changed, 156 insertions(+), 112 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-cmpne-runnable.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-cmpne.h

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index ad1224e0b57..31f65aa1b7a 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -2631,6 +2631,18 @@ (define_insn "altivec_vcmpequt_p"
   "vcmpequq. %0,%1,%2"
   [(set_attr "type" "veccmpfx")])
 
+;; Expand for builtin vcmpne{b,h,w}
+(define_expand "altivec_vcmpne_"
+  [(set (match_operand:VSX_EXTRACT_I 3 "altivec_register_operand" "=v")
+   (eq:VSX_EXTRACT_I (match_operand:VSX_EXTRACT_I 1 
"altivec_register_operand" "v")
+ (match_operand:VSX_EXTRACT_I 2 
"altivec_register_operand" "v")))
+   (set (match_operand:VSX_EXTRACT_I 0 "altivec_register_operand" "=v")
+(not:VSX_EXTRACT_I (match_dup 3)))]
+  "TARGET_ALTIVEC"
+  {
+operands[3] = gen_reg_rtx (GET_MODE (operands[0]));
+  });
+
 (define_insn "*altivec_vcmpgts_p"
   [(set (reg:CC CR6_REGNO)
(unspec:CC [(gt:CC (match_operand:VI2 1

Re: [PATCH v2] rs6000: Fix __builtin_altivec_vcmpne{b,h,w} implementation

2023-08-07 Thread Carl Love via Gcc-patches

On Mon, 2023-08-07 at 17:18 +0800, Kewen.Lin wrote:
> Hi Carl,
> 
> Sorry for the late review.
> 
> on 2023/8/2 02:29, Carl Love wrote:
> > GCC maintainers:
> > 
> > Ver 2:  Re-worked the test vec-cmpne.c to create a compile only
> > test
> > verify the instruction generation and a runnable test to verify the
> > built-in functionality.  Retested the patch on Power 8 LE/BE, Power
> > 9LE/BE and Power 10 LE with no regressions.
> > 
> > The following patch cleans up the definition for the
> > __builtin_altivec_vcmpne{b,h,w}.  The current implementation
> > implies
> > that the built-in is only supported on Power 9 since it is defined
> > under the Power 9 stanza.  However the built-in has no ISA
> > restrictions
> > as stated in the Power Vector Intrinsic Programming Reference
> > document.
> > The current built-in works because the built-in gets replaced
> > during
> > GIMPLE folding by a simple not-equal operator so it doesn't get
> > expanded and checked for Power 9 code generation.
> > 
> > This patch moves the definition to the Altivec stanza in the built-
> > in
> > definition file to make it clear the built-ins are valid for Power
> > 8,
> > Power 9 and beyond.  
> > 
> > The patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power
> > 10
> > LE with no regressions.
> > 
> > Please let me know if the patch is acceptable for
> > mainline.  Thanks.
> > 
> >   Carl 
> > 
> > 
> > rs6000: Fix __builtin_altivec_vcmpne{b,h,w} implementation
> > 
> > The current built-in definitions for vcmpneb, vcmpneh, vcmpnew are
> > defined
> > under the Power 9 section of r66000-builtins.  This implies they
> > are only
> > supported on Power 9 and above when in fact they are defined and
> > work with
> > Altivec as well with the appropriate Altivec instruction
> > generation.
> > 
> > The vec_cmpne builtin should generate the vcmpequ{b,h,w}
> > instruction with
> > Altivec enabled and generate the vcmpne{b,h,w} on Power 9 and newer
> > processors.
> > 
> > This patch moves the definitions to the Altivec stanza to make it
> > clear
> > the built-ins are supported for all Altivec processors.  The patch
> > enables the vcmpequ{b,h,w} instruction to be generated on Altivec
> > and
> > the vcmpne{b,h,w} instruction to be generated on Power 9 and
> > beyond.
> 
> But as you noted above, the current built-ins work as expected, that
> is
> to generate with vcmpequ{b,h,w} on altivec but not Power9 while
> generate
> with vcmpne{b,h,w} on Power9.  So I think we shouldn't say it's
> enabled
> by this patch.  Instead it's to remove the confusion.

OK, changed.
> 
> > There is existing test coverage for the vec_cmpne built-in for
> > vector bool char, vector bool short, vector bool int,
> > vector bool long long in builtins-3-p9.c and p8vector-builtin-2.c.
> > Coverage for vector signed int, vector unsigned int is in
> > p8vector-builtin-2.c.
> > 
> > Test vec-cmpne.c is updated to check the generation of the
> > vcmpequ{b,h,w}
> > instructions for Altivec.  A new test vec-cmpne-runnable.c is added
> > to
> > verify the built-ins work as expected.
> > 
> > Patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power 10
> > LE
> > with no regressions.
> > 
> > gcc/ChangeLog:
> > 
> > * config/rs6000/rs6000-builtins.def (vcmpneb, vcmpneh,
> > vcmpnew):
> > Move definitions to Altivec stanza.
> > * config/rs6000/altivec.md (vcmpneb, vcmpneh, vcmpnew): New
> > define_expand.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * gcc.target/powerpc/vec-cmpne-runnable.c: New execution test.
> > * gcc.target/powerpc/vec-cmpne.c (define_test_functions,
> > execute_test_functions) moved to vec-cmpne.h.  Added
> > scan-assembler-times for vcmpequb, vcmpequh, vcmpequw.
> > * gcc.target/powerpc/vec-cmpne.h: New include file for vec-
> > cmpne.c
> > and vec-cmpne-runnable.c. Split define_test_functions
> > definition
> > into define_test_functions and define_init_verify_functions.
> > ---
> >  gcc/config/rs6000/altivec.md  |  12 ++
> >  gcc/config/rs6000/rs6000-builtins.def |  18 +--
> >  .../gcc.target/powerpc/vec-cmpne-runnable.c   |  36 ++
> >  gcc/testsuite/gcc.target/powerpc/vec-cmpne.c  | 110 ++--
> > --
> >  gcc/testsuite/gcc.target/powerpc/vec-cmpne.h  |  86 ++
> >  5 files changed, 151 insertions(+), 111 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-cmpne-
> > runnable.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-cmpne.h
> > 
> > diff --git a/gcc/config/rs6000/altivec.md
> > b/gcc/config/rs6000/altivec.md
> > index ad1224e0b57..31f65aa1b7a 100644
> > --- a/gcc/config/rs6000/altivec.md
> > +++ b/gcc/config/rs6000/altivec.md
> > @@ -2631,6 +2631,18 @@ (define_insn "altivec_vcmpequt_p"
> >"vcmpequq. %0,%1,%2"
> >[(set_attr "type" "veccmpfx")])
> >  
> > +;; Expand for builtin vcmpne{b,h,w}
> > +(define_expand

[PATCH v2] rs6000: Fix __builtin_altivec_vcmpne{b,h,w} implementation

2023-08-01 Thread Carl Love via Gcc-patches



GCC maintainers:

Ver 2:  Re-worked the test vec-cmpne.c to create a compile only test
verify the instruction generation and a runnable test to verify the
built-in functionality.  Retested the patch on Power 8 LE/BE, Power 9LE/BE and 
Power 10 LE with no regressions.

The following patch cleans up the definition for the
__builtin_altivec_vcmpne{b,h,w}.  The current implementation implies
that the built-in is only supported on Power 9 since it is defined
under the Power 9 stanza.  However the built-in has no ISA restrictions
as stated in the Power Vector Intrinsic Programming Reference document.
The current built-in works because the built-in gets replaced during
GIMPLE folding by a simple not-equal operator so it doesn't get
expanded and checked for Power 9 code generation.

This patch moves the definition to the Altivec stanza in the built-in
definition file to make it clear the built-ins are valid for Power 8,
Power 9 and beyond.  

The patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power 10
LE with no regressions.

Please let me know if the patch is acceptable for mainline.  Thanks.

  Carl 


rs6000: Fix __builtin_altivec_vcmpne{b,h,w} implementation

The current built-in definitions for vcmpneb, vcmpneh, vcmpnew are defined
under the Power 9 section of r66000-builtins.  This implies they are only
supported on Power 9 and above when in fact they are defined and work with
Altivec as well with the appropriate Altivec instruction generation.

The vec_cmpne builtin should generate the vcmpequ{b,h,w} instruction with
Altivec enabled and generate the vcmpne{b,h,w} on Power 9 and newer
processors.

This patch moves the definitions to the Altivec stanza to make it clear
the built-ins are supported for all Altivec processors.  The patch
enables the vcmpequ{b,h,w} instruction to be generated on Altivec and
the vcmpne{b,h,w} instruction to be generated on Power 9 and beyond.

There is existing test coverage for the vec_cmpne built-in for
vector bool char, vector bool short, vector bool int,
vector bool long long in builtins-3-p9.c and p8vector-builtin-2.c.
Coverage for vector signed int, vector unsigned int is in
p8vector-builtin-2.c.

Test vec-cmpne.c is updated to check the generation of the vcmpequ{b,h,w}
instructions for Altivec.  A new test vec-cmpne-runnable.c is added to
verify the built-ins work as expected.

Patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power 10 LE
with no regressions.

gcc/ChangeLog:

* config/rs6000/rs6000-builtins.def (vcmpneb, vcmpneh, vcmpnew):
Move definitions to Altivec stanza.
* config/rs6000/altivec.md (vcmpneb, vcmpneh, vcmpnew): New
define_expand.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/vec-cmpne-runnable.c: New execution test.
* gcc.target/powerpc/vec-cmpne.c (define_test_functions,
execute_test_functions) moved to vec-cmpne.h.  Added
scan-assembler-times for vcmpequb, vcmpequh, vcmpequw.
* gcc.target/powerpc/vec-cmpne.h: New include file for vec-cmpne.c
and vec-cmpne-runnable.c. Split define_test_functions definition
into define_test_functions and define_init_verify_functions.
---
 gcc/config/rs6000/altivec.md  |  12 ++
 gcc/config/rs6000/rs6000-builtins.def |  18 +--
 .../gcc.target/powerpc/vec-cmpne-runnable.c   |  36 ++
 gcc/testsuite/gcc.target/powerpc/vec-cmpne.c  | 110 ++
 gcc/testsuite/gcc.target/powerpc/vec-cmpne.h  |  86 ++
 5 files changed, 151 insertions(+), 111 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-cmpne-runnable.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-cmpne.h

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index ad1224e0b57..31f65aa1b7a 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -2631,6 +2631,18 @@ (define_insn "altivec_vcmpequt_p"
   "vcmpequq. %0,%1,%2"
   [(set_attr "type" "veccmpfx")])
 
+;; Expand for builtin vcmpne{b,h,w}
+(define_expand "altivec_vcmpne_"
+  [(set (match_operand:VSX_EXTRACT_I 3 "altivec_register_operand" "=v")
+   (eq:VSX_EXTRACT_I (match_operand:VSX_EXTRACT_I 1 
"altivec_register_operand" "v")
+ (match_operand:VSX_EXTRACT_I 2 
"altivec_register_operand" "v")))
+   (set (match_operand:VSX_EXTRACT_I 0 "altivec_register_operand" "=v")
+(not:VSX_EXTRACT_I (match_dup 3)))]
+  "TARGET_ALTIVEC"
+  {
+operands[3] = gen_reg_rtx (GET_MODE (operands[0]));
+  });
+
 (define_insn "*altivec_vcmpgts_p"
   [(set (reg:CC CR6_REGNO)
(unspec:CC [(gt:CC (match_operand:VI2 1 "register_operand" "v")
diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 638d0bc72ca..6b06fa8b34d 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -641,6 +641,15 @@
   const int

Re: [PATCH] rs6000: Fix __builtin_altivec_vcmpne{b,h,w} implementation

2023-08-01 Thread Carl Love via Gcc-patches

Kewen:

On Mon, 2023-07-31 at 14:53 +0800, Kewen.Lin wrote:
> Hi Carl,
> 
> on 2023/7/28 23:00, Carl Love wrote:
> > GCC maintainers:
> > 
> > The following patch cleans up the definition for the
> > __builtin_altivec_vcmpnet.  The current implementation implies that
> > the
> 
> s/__builtin_altivec_vcmpnet/__builtin_altivec_vcmpne[bhw]/

OK, updated in email for version 2. 

> 
> > built-in is only supported on Power 9 since it is defined under the
> > Power 9 stanza.  However the built-in has no ISA restrictions as
> > stated
> > in the Power Vector Intrinsic Programming Reference document. The
> > current built-in works because the built-in gets replaced during
> > GIMPLE
> > folding by a simple not-equal operator so it doesn't get expanded
> > and
> > checked for Power 9 code generation.
> > 
> > This patch moves the definition to the Altivec stanza in the built-
> > in
> > definition file to make it clear the built-ins are valid for Power
> > 8,
> > Power 9 and beyond.  
> > 
> > The patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power
> > 10
> > LE with no regressions.
> > 
> > Please let me know if the patch is acceptable for
> > mainline.  Thanks.
> > 
> >   Carl 
> > 
> > --
> > rs6000: Fix __builtin_altivec_vcmpne{b,h,w} implementation
> > 
> > The current built-in definitions for vcmpneb, vcmpneh, vcmpnew are
> > defined
> > under the Power 9 section of r66000-builtins.  This implies they
> > are only
> > supported on Power 9 and above when in fact they are defined and
> > work on
> > Power 8 as well with the appropriate Power 8 instruction
> > generation.
> 
> Nit: It's confusing to say Power8 only, it's actually supported once
> altivec
> is enabled, so I think it's more clear to replace Power8 with altivec
> here.

OK, replaced Power 8 with Altivec here and for additional instances of
Power 8 below.

> 
> > The vec_cmpne builtin should generate the vcmpequ{b,h,w}
> > instruction on
> > Power 8 and generate the vcmpne{b,h,w} on Power 9 an newer
> > processors.
> 
> 
> Ditto for Power8 and "an" -> "and"?

Fixed, fixed.

> 
> > This patch moves the definitions to the Altivec stanza to make it
> > clear
> > the built-ins are supported for all Altivec processors.  The patch
> > enables the vcmpequ{b,h,w} instruction to be generated on Power 8
> > and
> > the vcmpne{b,h,w} instruction to be generated on Power 9 and
> > beyond.
> 
> Ditto for Power8.

fixed

> 
> > There is existing test coverage for the vec_cmpne built-in for
> > vector bool char, vector bool short, vector bool int,
> > vector bool long long in builtins-3-p9.c and p8vector-builtin-2.c.
> > Coverage for vector signed int, vector unsigned int is in
> > p8vector-builtin-2.c.
> 
> So there is no coverage with the basic altivec support.  I noticed
> we have one test case "gcc/testsuite/gcc.target/powerpc/vec-cmpne.c"
> which is a test case for running but with vsx_ok, I think we can
> rewrite it with altivec (vmx), either separating to compiling and
> running case, or adding -save-temp and check expected insns.

I looked at just adding -save-temp and scan-assembler-times for the
instructions.  I noticed that vcmpequw occurs 30 times in the functions
to initialize and test the results.  So, I opted to create a separate
compile/check instructions test and a runnable test to verify the
functionality.  This way any changes in the code to calculate and
verify the results will not break the instruction generation checks.

> 
> Coverage for unsigned long long int and long long int
> > for Power 10 in int_128bit-runnable.c.

Removed comment about Power 10, long long int testing.

> > 
> > Patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power 10
> > LE
> > with no regressions.
> > 
> > gcc/ChangeLog:
> > 
> > * config/rs6000/rs6000-builtins.def (vcmpneb, vcmpneh, vcmpnew.
> > vcmpnet): Move definitions to Altivec stanza.
> 
> vcmpnet which isn't handled in this patch should be removed.

Removed.
 
 Carl

[PATCH] rs6000: Fix __builtin_altivec_vcmpne{b,h,w} implementation

2023-07-28 Thread Carl Love via Gcc-patches

GCC maintainers:

The following patch cleans up the definition for the
__builtin_altivec_vcmpnet.  The current implementation implies that the
built-in is only supported on Power 9 since it is defined under the
Power 9 stanza.  However the built-in has no ISA restrictions as stated
in the Power Vector Intrinsic Programming Reference document. The
current built-in works because the built-in gets replaced during GIMPLE
folding by a simple not-equal operator so it doesn't get expanded and
checked for Power 9 code generation.

This patch moves the definition to the Altivec stanza in the built-in
definition file to make it clear the built-ins are valid for Power 8,
Power 9 and beyond.  

The patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power 10
LE with no regressions.

Please let me know if the patch is acceptable for mainline.  Thanks.

  Carl 

--
rs6000: Fix __builtin_altivec_vcmpne{b,h,w} implementation

The current built-in definitions for vcmpneb, vcmpneh, vcmpnew are defined
under the Power 9 section of r66000-builtins.  This implies they are only
supported on Power 9 and above when in fact they are defined and work on
Power 8 as well with the appropriate Power 8 instruction generation.

The vec_cmpne builtin should generate the vcmpequ{b,h,w} instruction on
Power 8 and generate the vcmpne{b,h,w} on Power 9 an newer processors.

This patch moves the definitions to the Altivec stanza to make it clear
the built-ins are supported for all Altivec processors.  The patch
enables the vcmpequ{b,h,w} instruction to be generated on Power 8 and
the vcmpne{b,h,w} instruction to be generated on Power 9 and beyond.

There is existing test coverage for the vec_cmpne built-in for
vector bool char, vector bool short, vector bool int,
vector bool long long in builtins-3-p9.c and p8vector-builtin-2.c.
Coverage for vector signed int, vector unsigned int is in
p8vector-builtin-2.c.  Coverage for unsigned long long int and long long int
for Power 10 in int_128bit-runnable.c.

Patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power 10 LE
with no regressions.

gcc/ChangeLog:

* config/rs6000/rs6000-builtins.def (vcmpneb, vcmpneh, vcmpnew.
vcmpnet): Move definitions to Altivec stanza.
* config/rs6000/altivec.md (vcmpneb, vcmpneh, vcmpnew): New
define_expand.
---
 gcc/config/rs6000/altivec.md  | 12 
 gcc/config/rs6000/rs6000-builtins.def | 18 +-
 2 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index ad1224e0b57..31f65aa1b7a 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -2631,6 +2631,18 @@ (define_insn "altivec_vcmpequt_p"
   "vcmpequq. %0,%1,%2"
   [(set_attr "type" "veccmpfx")])
 
+;; Expand for builtin vcmpne{b,h,w}
+(define_expand "altivec_vcmpne_"
+  [(set (match_operand:VSX_EXTRACT_I 3 "altivec_register_operand" "=v")
+   (eq:VSX_EXTRACT_I (match_operand:VSX_EXTRACT_I 1 
"altivec_register_operand" "v")
+ (match_operand:VSX_EXTRACT_I 2 
"altivec_register_operand" "v")))
+   (set (match_operand:VSX_EXTRACT_I 0 "altivec_register_operand" "=v")
+(not:VSX_EXTRACT_I (match_dup 3)))]
+  "TARGET_ALTIVEC"
+  {
+operands[3] = gen_reg_rtx (GET_MODE (operands[0]));
+  });
+
 (define_insn "*altivec_vcmpgts_p"
   [(set (reg:CC CR6_REGNO)
(unspec:CC [(gt:CC (match_operand:VI2 1 "register_operand" "v")
diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 638d0bc72ca..6b06fa8b34d 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -641,6 +641,15 @@
   const int __builtin_altivec_vcmpgtuw_p (int, vsi, vsi);
 VCMPGTUW_P vector_gtu_v4si_p {pred}
 
+  const vsc __builtin_altivec_vcmpneb (vsc, vsc);
+VCMPNEB altivec_vcmpne_v16qi {}
+
+  const vss __builtin_altivec_vcmpneh (vss, vss);
+VCMPNEH altivec_vcmpne_v8hi {}
+
+  const vsi __builtin_altivec_vcmpnew (vsi, vsi);
+VCMPNEW altivec_vcmpne_v4si {}
+
   const vsi __builtin_altivec_vctsxs (vf, const int<5>);
 VCTSXS altivec_vctsxs {}
 
@@ -2599,9 +2608,6 @@
   const signed int __builtin_altivec_vcmpaew_p (vsi, vsi);
 VCMPAEW_P vector_ae_v4si_p {}
 
-  const vsc __builtin_altivec_vcmpneb (vsc, vsc);
-VCMPNEB vcmpneb {}
-
   const signed int __builtin_altivec_vcmpneb_p (vsc, vsc);
 VCMPNEB_P vector_ne_v16qi_p {}
 
@@ -2614,15 +2620,9 @@
   const signed int __builtin_altivec_vcmpnefp_p (vf, vf);
 VCMPNEFP_P vector_ne_v4sf_p {}
 
-  const vss __builtin_altivec_vcmpneh (vss, vss);
-VCMPNEH vcmpneh {}
-
   const signed int __builtin_altivec_vcmpneh_p (vss, vss);
 VCMPNEH_P vector_ne_v8hi_p {}
 
-  const vsi __builtin_altivec_vcmpnew (vsi, vsi);
-VCMPNEW vcmpnew {}
-
   const signed int __builtin_altivec_vcmpnew_p (vsi, vsi);
 VCMPNEW_P vector_ne_v4si_p {}
 
--

Re: [PATCH 2/2 ver 5] rs6000, fix vec_replace_unaligned built-in arguments

2023-07-21 Thread Carl Love via Gcc-patches

GCC maintainers:

Version 5, Fixed patch description, the first argument should be of
type vector.  Fixed comment in vsx.md to say "Vector and scalar
extract_elt iterator/attr ".  Removed a few of the changes in
version 4.  Specifically, reverted the names of REPLACE_ELT_V_sh back
to REPLACE_ELT_sh and REPLACE_ELT_V_max back to REPLACE_ELT_V_max. 
Combined the REPLACE_ELT_char and REPLACE_ELT_V_char mode attributes
into REPLACE_ELT_char.  Put the "dg-do link" directive back into the
vec-replace-word-runnable_1.c test file.  The patch was tested with the
updated patch 1 in the series on Power 8 LE/BE, Power 9 LE/BE and Power
10 with no regressions.

Version 4, changed the new RS6000_OVLD_VEC_REPLACE_UN case statement
rs6000/rs6000-c.cc.  The existing REPLACE_ELT iterator name was changed
to REPLACE_ELT_V along with the associated define_mode_attr.  Renamed
VEC_RU to REPLACE_ELT for the iterator name and VEC_RU_char to
REPLACE_ELT_char.  Fixed the double test in vec-replace-word-
runnable_1.c to be consistent with the other tests.  Removed the "dg-
do 
link" from both tests.  Put in an explicit cast in test vec-replace-
word-runnable_2.c to eliminate the need for the -flax-vector-
conversions dg-option.

Version 3, added code to altivec_resolve_overloaded_builtin so the
correct instruction is selected for the size of the second argument. 
This restores the instruction counts to the original values where the
correct instructions were originally being generated.  The naming of
the overloaded builtin instances and builtin definitions were changed
to reflect the type of the second argument since the type of the first
argument is now the same for all overloaded instances.  A new builtin
test file was added for the case where the first argument is cast to
the unsigned long long type.  This test requires the -flax-vector-
conversions gcc command line option.  Since the other tests do not
require this option, I felt that the new test needed to be in a
separate file.  Finally some formatting fixes were made in the original
test file.  Patch has been retested on Power 10 with no regressions.

Version 2, fixed various typos.  Updated the change log body to say the
instruction counts were updated.  The instruction counts changed as a
result of changing the first argument of the vec_replace_unaligned
builtin call from vector unsigned long long (vull) to vector unsigned
char (vuc).  When the first argument was vull the builtin call
generated the vinsd instruction for the two test cases.  The updated
call with vuc as the first argument generates two vinsw instructions
instead.  Patch was retested on Power 10 with no regressions.

The following patch fixes the first argument in the builtin definition
and the corresponding test cases.  Initially, the builtin specification
was wrong due to a cut and past error.  The documentation was fixed in:

   commit ed3fea09b18f67e757b5768b42cb6e816626f1db
   Author: Bill Schmidt 
   Date:   Fri Feb 4 13:07:17 2022 -0600

   rs6000: Correct function prototypes for vec_replace_unaligned

   Due to a pasto error in the documentation, vec_replace_unaligned
was
   implemented with the same function prototypes as
vec_replace_elt.  
   It was intended that vec_replace_unaligned always specify output
   vectors as having type vector unsigned char, to emphasize that 
   elements are potentially misaligned by this built-in function.  
   This patch corrects the misimplementation.


This patch fixes the arguments in the definitions and updates the
testcases accordingly.  Additionally, a few minor spacing issues are
fixed.

The patch has been tested on Power 10 with no regressions.  Please let
me know if the patch is acceptable for mainline.  Thanks.

 Carl 






rs6000, fix vec_replace_unaligned built-in arguments

The first argument of the vec_replace_unaligned built-in should always be
of type vector unsigned char, as specified in gcc/doc/extend.texi.

This patch fixes the builtin definitions and updates the test cases to use
the correct arguments.  The original test file is renamed and a second test
file is added for a new test case.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def: Rename
__builtin_altivec_vreplace_un_uv2di as __builtin_altivec_vreplace_un_udi
__builtin_altivec_vreplace_un_uv4si as __builtin_altivec_vreplace_un_usi
__builtin_altivec_vreplace_un_v2df as __builtin_altivec_vreplace_un_df
__builtin_altivec_vreplace_un_v2di as __builtin_altivec_vreplace_un_di
__builtin_altivec_vreplace_un_v4sf as __builtin_altivec_vreplace_un_sf
__builtin_altivec_vreplace_un_v4si as __builtin_altivec_vreplace_un_si.
Rename VREPLACE_UN_UV2DI as VREPLACE_UN_UDI, VREPLACE_UN_UV4SI as
VREPLACE_UN_USI, VREPLACE_UN_V2DF as VREPLACE_UN_DF,
VREPLACE_UN_V2DI as VREPLACE_UN_DI, VREPLACE_UN_V4SF as
VREPLACE_UN_SF, VREPLACE_UN_V4SI

[PATCH 1/2 ver 2] rs6000, add argument to function find_instance

2023-07-21 Thread Carl Love via Gcc-patches

GCC maintainers:

Version 2:  Updated a number of formatting and spacing issues.   Added
the NARGS description to the header comment for function find_instance.
This patch was tested on Power 8 LE/BE, Power 9 LE/BE and Power 10 LE
with no regressions.

The rs6000 function find_instance assumes that it is called for built-
ins with only two arguments.  There is no checking for the actual
number of aruguments used in the built-in.  This patch adds an
additional parameter to the function call containing the number of
aruguments in the built-in.  The function will now do the needed checks
for all of the arguments.

This fix is needed for the next patch in the series that fixes the
vec_replace_unaligned built-in.c test.

Please let me know if this patch is acceptable for mainline.  Thanks.

Carl 




-
rs6000, add argument to function find_instance

The function find_instance assumes it is called to check a built-in with
only two arguments.  This patch extends the function by adding a parameter
specifying the number of built-in arguments to check.

gcc/ChangeLog:
* config/rs6000/rs6000-c.cc (find_instance): Add new parameter that
specifies the number of built-in arguments to check.
(altivec_resolve_overloaded_builtin): Update calls to find_instance
to pass the number of built-in arguments to be checked.
---
 gcc/config/rs6000/rs6000-c.cc | 40 +++
 1 file changed, 26 insertions(+), 14 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
index a353bca19ef..de35490de42 100644
--- a/gcc/config/rs6000/rs6000-c.cc
+++ b/gcc/config/rs6000/rs6000-c.cc
@@ -1668,18 +1668,20 @@ resolve_vec_step (resolution *res, vec 
*arglist, unsigned nargs)
 /* Look for a matching instance in a chain of instances.  INSTANCE points to
the chain of instances; INSTANCE_CODE is the code identifying the specific
built-in being searched for; FCODE is the overloaded function code; TYPES
-   contains an array of two types that must match the types of the instance's
-   parameters; and ARGS contains an array of two arguments to be passed to
-   the instance.  If found, resolve the built-in and return it, unless the
-   built-in is not supported in context.  In that case, set
-   UNSUPPORTED_BUILTIN to true.  If we don't match, return error_mark_node
-   and leave UNSUPPORTED_BUILTIN alone.  */
+   contains an array of NARGS types that must match the types of the
+   instance's parameters; ARGS contains an array of NARGS arguments to be
+   passed to the instance; and NARGS is the number of built-in arguments to
+   check.  If found, resolve the built-in and return it, unless the built-in
+   is not supported in context.  In that case, set UNSUPPORTED_BUILTIN to
+   true.  If we don't match, return error_mark_node and leave
+   UNSUPPORTED_BUILTIN alone.
+*/
 
 tree
 find_instance (bool *unsupported_builtin, ovlddata **instance,
   rs6000_gen_builtins instance_code,
   rs6000_gen_builtins fcode,
-  tree *types, tree *args)
+  tree *types, tree *args, int nargs)
 {
   while (*instance && (*instance)->bifid != instance_code)
 *instance = (*instance)->next;
@@ -1691,17 +1693,27 @@ find_instance (bool *unsupported_builtin, ovlddata 
**instance,
   if (!inst->fntype)
 return error_mark_node;
   tree fntype = rs6000_builtin_info[inst->bifid].fntype;
-  tree parmtype0 = TREE_VALUE (TYPE_ARG_TYPES (fntype));
-  tree parmtype1 = TREE_VALUE (TREE_CHAIN (TYPE_ARG_TYPES (fntype)));
+  tree argtype = TYPE_ARG_TYPES (fntype);
+  bool args_compatible = true;
 
-  if (rs6000_builtin_type_compatible (types[0], parmtype0)
-  && rs6000_builtin_type_compatible (types[1], parmtype1))
+  for (int i = 0; i < nargs; i++)
+{
+  tree parmtype = TREE_VALUE (argtype);
+  if (!rs6000_builtin_type_compatible (types[i], parmtype))
+   {
+ args_compatible = false;
+ break;
+   }
+  argtype = TREE_CHAIN (argtype);
+}
+
+  if (args_compatible)
 {
   if (rs6000_builtin_decl (inst->bifid, false) != error_mark_node
  && rs6000_builtin_is_supported (inst->bifid))
{
  tree ret_type = TREE_TYPE (inst->fntype);
- return altivec_build_resolved_builtin (args, 2, fntype, ret_type,
+ return altivec_build_resolved_builtin (args, nargs, fntype, ret_type,
 inst->bifid, fcode);
}
   else
@@ -1921,7 +1933,7 @@ altivec_resolve_overloaded_builtin (location_t loc, tree 
fndecl,
  instance_code = RS6000_BIF_CMPB_32;
 
tree call = find_instance (_builtin, ,
-  instance_code, fcode, types, args);
+  instance_code, fcode, types, args, nargs);
if (call != error_mark_node)
  return call;
break;
@@ -1958,7 +1970,7 @@

[PATCH 0/2 ver 2] rs6000, fix vec_replace_unaligned built-in arguments

2023-07-21 Thread Carl Love via Gcc-patches

GCC maintianers:

Version 2.  Both patches have been updated the first patch was approved
with minor issues to be fixed.  I will post the updated version as
version 2 for completeness of the series.  There were a few changes
with the second patch as well.  The second patch has not been approved
yet.  The updated version of the second patch is version 5 with the
requested changes made.  The two patches were tested together on Power
8 LE/BE, Power 9 LE/BE and Power 10 LE with no regressions.

In the process of fixing the powerpc/vec-replace-word-runnable.c test I
found there is an existing issue with function find_instance in rs6000-
c.cc.  Per the review comments from Kewen in

https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624401.html

The fix for function find_instance was put into a separate patch
followed by a patch for the vec-replace-word-runnable.c test fixes.

The two patches have been tested on Power 10 LE with no regression
failures.

   Carl

Re: [PATCH 2/2 ver 4] rs6000, fix vec_replace_unaligned built-in arguments

2023-07-21 Thread Carl Love via Gcc-patches

On Fri, 2023-07-21 at 13:04 +0800, Kewen.Lin wrote:
> Hi Carl,
> 
> on 2023/7/18 03:20, Carl Love wrote:
> > GCC maintainers:
> > 
> > Version 4, changed the new RS6000_OVLD_VEC_REPLACE_UN case
> > statement
> > rs6000/rs6000-c.cc.  The existing REPLACE_ELT iterator name was
> > changed
> > to REPLACE_ELT_V along with the associated
> > define_mode_attr.  Renamed
> > VEC_RU to REPLACE_ELT for the iterator name and VEC_RU_char to
> > REPLACE_ELT_char.  Fixed the double test in vec-replace-word-
> > runnable_1.c to be consistent with the other tests.  Removed the
> > "dg-do 
> > link" from both tests.  Put in an explicit cast in test vec-
> > replace-word-runnable_2.c to eliminate the need for the -flax-
> > vector-conversions dg-option.
> > 
> > Version 3, added code to altivec_resolve_overloaded_builtin so the
> > correct instruction is selected for the size of the second
> > argument. 
> > This restores the instruction counts to the original values where
> > the
> > correct instructions were originally being generated.  The naming
> > of
> > the overloaded builtin instances and builtin definitions were
> > changed
> > to reflect the type of the second argument since the type of the
> > first
> > argument is now the same for all overloaded instances.  A new
> > builtin
> > test file was added for the case where the first argument is cast
> > to
> > the unsigned long long type.  This test requires the -flax-vector-
> > conversions gcc command line option.  Since the other tests do not
> > require this option, I felt that the new test needed to be in a
> > separate file.  Finally some formatting fixes were made in the
> > original
> > test file.  Patch has been retested on Power 10 with no
> > regressions.
> > 
> > Version 2, fixed various typos.  Updated the change log body to say
> > the
> > instruction counts were updated.  The instruction counts changed as
> > a
> > result of changing the first argument of the vec_replace_unaligned
> > builtin call from vector unsigned long long (vull) to vector
> > unsigned
> > char (vuc).  When the first argument was vull the builtin call
> > generated the vinsd instruction for the two test cases.  The
> > updated
> > call with vuc as the first argument generates two vinsw
> > instructions
> > instead.  Patch was retested on Power 10 with no regressions.
> > 
> > The following patch fixes the first argument in the builtin
> > definition
> > and the corresponding test cases.  Initially, the builtin
> > specification
> > was wrong due to a cut and past error.  The documentation was fixed
> > in:
> > 
> >commit ed3fea09b18f67e757b5768b42cb6e816626f1db
> >Author: Bill Schmidt 
> >Date:   Fri Feb 4 13:07:17 2022 -0600
> > 
> >rs6000: Correct function prototypes for
> > vec_replace_unaligned
> > 
> >Due to a pasto error in the documentation,
> > vec_replace_unaligned was
> >implemented with the same function prototypes as
> > vec_replace_elt.  
> >It was intended that vec_replace_unaligned always specify
> > output
> >vectors as having type vector unsigned char, to emphasize
> > that 
> >elements are potentially misaligned by this built-in
> > function.  
> >This patch corrects the misimplementation.
> > 
> > 
> > This patch fixes the arguments in the definitions and updates the
> > testcases accordingly.  Additionally, a few minor spacing issues
> > are
> > fixed.
> > 
> > The patch has been tested on Power 10 with no regressions.  Please
> > let
> > me know if the patch is acceptable for mainline.  Thanks.
> > 
> >  Carl 
> > 
> > 
> > 
> > rs6000, fix vec_replace_unaligned built-in arguments
> > 
> > The first argument of the vec_replace_unaligned built-in should
> > always be
> > of type unsigned char, as specified in gcc/doc/extend.texi.
> 
> Shouldn't be "vector unsigned char" instead of "unsigned char"?
> 
> Or do I miss something?

Nope, I missed saying "vector".  Fixed.

> 
> > This patch fixes the builtin definitions and updates the test cases
> > to use
> > the correct arguments.  The original test file is renamed and a
> > second test
> > file is added for a new test case.
> > 
> > gcc/ChangeLog:
> > * config/rs6000/rs6000-builtins.def: Rename
> > __builtin_altivec_vreplace_un_uv2di as
> > __builtin_altivec_vreplace_un_udi
> > __builtin_altivec_vreplace_un_uv4si as
> > __builtin_altivec_vreplace_un_usi
> > __builtin_altivec_vreplace_un_v2df as
> > __builtin_altivec_vreplace_un_df
> > __builtin_altivec_vreplace_un_v2di as
> > __builtin_altivec_vreplace_un_di
> > __builtin_altivec_vreplace_un_v4sf as
> > __builtin_altivec_vreplace_un_sf
> > __builtin_altivec_vreplace_un_v4si as
> > __builtin_altivec_vreplace_un_si.
> > Rename VREPLACE_UN_UV2DI as VREPLACE_UN_UDI, VREPLACE_UN_UV4SI
> > as
> > VREPLACE_UN_USI, VREPLACE_UN_V2DF as VREPLACE_UN_DF,
> > VREPLACE_UN_V2DI as VREPLACE_UN_DI, VREPLACE_UN_V4SF as
> >

Re: [PATCH 1/2] rs6000, add argument to function find_instance

2023-07-21 Thread Carl Love via Gcc-patches

On Fri, 2023-07-21 at 10:19 +0800, Kewen.Lin wrote:
> Hi Carl,
> 
> on 2023/7/18 03:19, Carl Love wrote:
> > GCC maintainers:
> > 
> > The rs6000 function find_instance assumes that it is called for
> > built-
> > ins with only two arguments.  There is no checking for the actual
> > number of aruguments used in the built-in.  This patch adds an
> > additional parameter to the function call containing the number of
> > aruguments in the built-in.  The function will now do the needed
> > checks
> > for all of the arguments.
> > 
> > This fix is needed for the next patch in the series that fixes the
> > vec_replace_unaligned built-in.c test.
> > 
> > Please let me know if this patch is acceptable for
> > mainline.  Thanks.
> > 
> > Carl 
> > 
> > 
> > 
> > rs6000, add argument to function find_instance
> > 
> > The function find_instance assumes it is called to check a built-
> > in  with   

Fixed
> >   ~~ two spaces.
> > only two arguments.  Ths patch extends the function by adding a
> > parameter
>s/Ths/This/
> > specifying the number of buit-in arguments to check.
>   s/bult-in/built-in/
> 
Fixed both typos.

> > gcc/ChangeLog:
> > * config/rs6000/rs6000-c.cc (find_instance): Add new parameter
> > that
> > specifies the number of built-in arguments to check.
> > (altivec_resolve_overloaded_builtin): Update calls to
> > find_instance
> > to pass the number of built-in argument to be checked.
> 
> s/argument/arguments/
fixed
> 
> > ---
> >  gcc/config/rs6000/rs6000-c.cc | 27 +++
> >  1 file changed, 19 insertions(+), 8 deletions(-)
> > 
> > diff --git a/gcc/config/rs6000/rs6000-c.cc
> > b/gcc/config/rs6000/rs6000-c.cc
> > index a353bca19ef..350987b851b 100644
> > --- a/gcc/config/rs6000/rs6000-c.cc
> > +++ b/gcc/config/rs6000/rs6000-c.cc
> > @@ -1679,7 +1679,7 @@ tree
> 
> There is one function comment here describing the meaning of each
> parameter,
> I think we should add a corresponding for NARGS, may be something
> like:
> 
> "; and NARGS specifies the number of built-in arguments."
> 
Added NARGS description.

> Also we need to update the below "two"s with "NARGS".
> 
> "TYPES contains an array of two types..." and "ARGS contains an array
> of two arguments..."
> 

Replaced multiple "two" occurrences with NARGS.

> since we already extend this to handle NARGS instead of two.
> 
> >  find_instance (bool *unsupported_builtin, ovlddata **instance,
> >rs6000_gen_builtins instance_code,
> >rs6000_gen_builtins fcode,
> > -  tree *types, tree *args)
> > +  tree *types, tree *args, int nargs)
> >  {
> >while (*instance && (*instance)->bifid != instance_code)
> >  *instance = (*instance)->next;
> > @@ -1691,17 +1691,28 @@ find_instance (bool *unsupported_builtin,
> > ovlddata **instance,
> >if (!inst->fntype)
> >  return error_mark_node;
> >tree fntype = rs6000_builtin_info[inst->bifid].fntype;
> > -  tree parmtype0 = TREE_VALUE (TYPE_ARG_TYPES (fntype));
> > -  tree parmtype1 = TREE_VALUE (TREE_CHAIN (TYPE_ARG_TYPES
> > (fntype)));
> > +  tree argtype = TYPE_ARG_TYPES (fntype);
> > +  tree parmtype;
> 
> Nit: We can move "tree parmtype" into the loop (close to its only
> use).

Moved and combined declaration with assignment as you noted below.

> 
> > +  int args_compatible = true;
> 
> s/int/bool/
Changed.

> 
> >  
> > -  if (rs6000_builtin_type_compatible (types[0], parmtype0)
> > -  && rs6000_builtin_type_compatible (types[1], parmtype1))
> > +  for (int i = 0; i  
> Nit: formatting issue, space before nargs.
> 
> >  {
> > +  parmtype = TREE_VALUE (argtype);
> 
>  tree parmtype = TREE_VALUE (argtype);

Changed

> 
> > +  if (! rs6000_builtin_type_compatible (types[i], parmtype))
> 
> Nit: One unexpected(?) space after "!".

Removed extra space after "!".
> 
> > +   {
> > + args_compatible = false;
> > + break;
> > +   }
> > +  argtype = TREE_CHAIN (argtype);
> > +}
> > +
> > +  if (args_compatible)
> > +  {
> 
> Nit: indent issue for "{".
Fixed indent.

> 
> Ok for trunk with these nits fixed.  Btw, the description doesn't say
> how this was tested, I'm not sure if it's only tested together with
> "patch 2/2", but please ensure it's bootstrapped and regress-tested
> on BE and LE when committing.  Thanks!
> 

Yes, it was tested with patch 2/2 on Power 10 LE.  I did do a test on
Power 9 as well but don't recall if I tested for both BE and LE.  Will
retest on Power 8 LE/BE, Power 9 LE/BE and Power 10.

 Carl

Re: rs6000: Fix expected counts powerpc/p9-vec-length-full

2023-07-18 Thread Carl Love via Gcc-patches

Ping

On Thu, 2023-06-01 at 16:11 -0700, Carl Love wrote:
> GCC maintainers:
> 
> The following patch updates the expected instruction counts in four
> tests.  The counts in all of the tests changed with commit
> f574e2dfae79055f16d0c63cc12df24815d8ead6.  
> 
> The updated counts have been verified on both Power 9 and Power 10.
> 
> Please let me know if this patch is acceptable for mainline.  Thanks.
> 
>   Carl 
> 
> 
> rs6000: Fix expected counts powerpc/p9-vec-length-full tests
> 
> The counts for instructions lxvl and stxvl in tests:
> 
>   p9-vec-length-full-1.c
>   p9-vec-length-full-2.c
>   p9-vec-length-full-6.c
>   p9-vec-length-full-7.c
> 
> changed with commit:
> 
>commit f574e2dfae79055f16d0c63cc12df24815d8ead6
>Author: Ju-Zhe Zhong 
>Date:   Thu May 25 22:42:35 2023 +0800
> 
>  VECT: Add decrement IV iteration loop control by variable amount
> support
> 
>  This patch is supporting decrement IV by following the flow
> designed by
>  Richard:
>...
> 
> The expected counts for lxvl changed from 20 to 40 and the counts for
> stxvl
> changed from 10 to 20 in the first three tests.  The number of stxvl
> instructions changed from 12 to 20 in p9-vec-length-full-7.c.  This
> patch updates the number of expected instructions in the four tests.
> 
> The counts have been verified on Power 9 and Power 10.
> ---
>  gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-1.c | 4 ++--
>  gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-2.c | 4 ++--
>  gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-6.c | 4 ++--
>  gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-7.c | 2 +-
>  4 files changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-1.c
> b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-1.c
> index f01f1c54fa5..5e4f34421d3 100644
> --- a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-1.c
> +++ b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-1.c
> @@ -12,5 +12,5 @@
>  /* { dg-final { scan-assembler-not   {\mstxv\M} } } */
>  /* { dg-final { scan-assembler-not   {\mlxvx\M} } } */
>  /* { dg-final { scan-assembler-not   {\mstxvx\M} } } */
> -/* { dg-final { scan-assembler-times {\mlxvl\M} 20 } } */
> -/* { dg-final { scan-assembler-times {\mstxvl\M} 10 } } */
> +/* { dg-final { scan-assembler-times {\mlxvl\M} 40 } } */
> +/* { dg-final { scan-assembler-times {\mstxvl\M} 20 } } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-2.c
> b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-2.c
> index f546e97fa7d..c7d927382c3 100644
> --- a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-2.c
> +++ b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-2.c
> @@ -12,5 +12,5 @@
>  /* { dg-final { scan-assembler-not   {\mstxv\M} } } */
>  /* { dg-final { scan-assembler-not   {\mlxvx\M} } } */
>  /* { dg-final { scan-assembler-not   {\mstxvx\M} } } */
> -/* { dg-final { scan-assembler-times {\mlxvl\M} 20 } } */
> -/* { dg-final { scan-assembler-times {\mstxvl\M} 10 } } */
> +/* { dg-final { scan-assembler-times {\mlxvl\M} 40 } } */
> +/* { dg-final { scan-assembler-times {\mstxvl\M} 20 } } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-6.c
> b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-6.c
> index 65ddf2b098a..f3be3842c62 100644
> --- a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-6.c
> +++ b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-6.c
> @@ -11,5 +11,5 @@
>  /* It can use normal vector load for constant vector load.  */
>  /* { dg-final { scan-assembler-times {\mstxvx?\M} 6 } } */
>  /* 64bit/32bit pairs won't use partial vectors.  */
> -/* { dg-final { scan-assembler-times {\mlxvl\M} 10 } } */
> -/* { dg-final { scan-assembler-times {\mstxvl\M} 10 } } */
> +/* { dg-final { scan-assembler-times {\mlxvl\M} 20 } } */
> +/* { dg-final { scan-assembler-times {\mstxvl\M} 20 } } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-7.c
> b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-7.c
> index e0e51d9a972..da086f1826a 100644
> --- a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-7.c
> +++ b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-7.c
> @@ -12,4 +12,4 @@
> 
>  /* Each type has one stxvl excepting for int8 and uint8, that have
> two due to
> rtl pass bbro duplicating the block which has one stxvl.  */
> -/* { dg-final { scan-assembler-times {\mstxvl\M} 12 } } */
> +/* { dg-final { scan-assembler-times {\mstxvl\M} 20 } } */

[PATCH 2/2 ver 4] rs6000, fix vec_replace_unaligned built-in arguments

2023-07-17 Thread Carl Love via Gcc-patches

GCC maintainers:

Version 4, changed the new RS6000_OVLD_VEC_REPLACE_UN case statement
rs6000/rs6000-c.cc.  The existing REPLACE_ELT iterator name was changed
to REPLACE_ELT_V along with the associated define_mode_attr.  Renamed
VEC_RU to REPLACE_ELT for the iterator name and VEC_RU_char to
REPLACE_ELT_char.  Fixed the double test in vec-replace-word-
runnable_1.c to be consistent with the other tests.  Removed the "dg-do 
link" from both tests.  Put in an explicit cast in test 
vec-replace-word-runnable_2.c to eliminate the need for the 
-flax-vector-conversions dg-option.

Version 3, added code to altivec_resolve_overloaded_builtin so the
correct instruction is selected for the size of the second argument. 
This restores the instruction counts to the original values where the
correct instructions were originally being generated.  The naming of
the overloaded builtin instances and builtin definitions were changed
to reflect the type of the second argument since the type of the first
argument is now the same for all overloaded instances.  A new builtin
test file was added for the case where the first argument is cast to
the unsigned long long type.  This test requires the -flax-vector-
conversions gcc command line option.  Since the other tests do not
require this option, I felt that the new test needed to be in a
separate file.  Finally some formatting fixes were made in the original
test file.  Patch has been retested on Power 10 with no regressions.

Version 2, fixed various typos.  Updated the change log body to say the
instruction counts were updated.  The instruction counts changed as a
result of changing the first argument of the vec_replace_unaligned
builtin call from vector unsigned long long (vull) to vector unsigned
char (vuc).  When the first argument was vull the builtin call
generated the vinsd instruction for the two test cases.  The updated
call with vuc as the first argument generates two vinsw instructions
instead.  Patch was retested on Power 10 with no regressions.

The following patch fixes the first argument in the builtin definition
and the corresponding test cases.  Initially, the builtin specification
was wrong due to a cut and past error.  The documentation was fixed in:

   commit ed3fea09b18f67e757b5768b42cb6e816626f1db
   Author: Bill Schmidt 
   Date:   Fri Feb 4 13:07:17 2022 -0600

   rs6000: Correct function prototypes for vec_replace_unaligned

   Due to a pasto error in the documentation, vec_replace_unaligned was
   implemented with the same function prototypes as vec_replace_elt.  
   It was intended that vec_replace_unaligned always specify output
   vectors as having type vector unsigned char, to emphasize that 
   elements are potentially misaligned by this built-in function.  
   This patch corrects the misimplementation.


This patch fixes the arguments in the definitions and updates the
testcases accordingly.  Additionally, a few minor spacing issues are
fixed.

The patch has been tested on Power 10 with no regressions.  Please let
me know if the patch is acceptable for mainline.  Thanks.

 Carl 



rs6000, fix vec_replace_unaligned built-in arguments

The first argument of the vec_replace_unaligned built-in should always be
of type unsigned char, as specified in gcc/doc/extend.texi.

This patch fixes the builtin definitions and updates the test cases to use
the correct arguments.  The original test file is renamed and a second test
file is added for a new test case.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def: Rename
__builtin_altivec_vreplace_un_uv2di as __builtin_altivec_vreplace_un_udi
__builtin_altivec_vreplace_un_uv4si as __builtin_altivec_vreplace_un_usi
__builtin_altivec_vreplace_un_v2df as __builtin_altivec_vreplace_un_df
__builtin_altivec_vreplace_un_v2di as __builtin_altivec_vreplace_un_di
__builtin_altivec_vreplace_un_v4sf as __builtin_altivec_vreplace_un_sf
__builtin_altivec_vreplace_un_v4si as __builtin_altivec_vreplace_un_si.
Rename VREPLACE_UN_UV2DI as VREPLACE_UN_UDI, VREPLACE_UN_UV4SI as
VREPLACE_UN_USI, VREPLACE_UN_V2DF as VREPLACE_UN_DF,
VREPLACE_UN_V2DI as VREPLACE_UN_DI, VREPLACE_UN_V4SF as
VREPLACE_UN_SF, VREPLACE_UN_V4SI as VREPLACE_UN_SI.
Rename vreplace_un_v2di as vreplace_un_di, vreplace_un_v4si as
vreplace_un_si, vreplace_un_v2df as vreplace_un_df,
vreplace_un_v2di as vreplace_un_di, vreplace_un_v4sf as
vreplace_un_sf, vreplace_un_v4si as vreplace_un_si.
* config/rs6000/rs6000-c.cc (find_instance): Add case
RS6000_OVLD_VEC_REPLACE_UN.
* config/rs6000/rs6000-overload.def (__builtin_vec_replace_un):
Fix first argument type.  Rename VREPLACE_UN_UV4SI as
VREPLACE_UN_USI, VREPLACE_UN_V4SI as VREPLACE_UN_SI,
VREPLACE_UN_UV2DI as VREPLACE_UN_UDI, VREPLACE_UN_V2DI as
VREPLACE_UN_DI,

[PATCH 1/2] rs6000, add argument to function find_instance

2023-07-17 Thread Carl Love via Gcc-patches



GCC maintainers:

The rs6000 function find_instance assumes that it is called for built-
ins with only two arguments.  There is no checking for the actual
number of aruguments used in the built-in.  This patch adds an
additional parameter to the function call containing the number of
aruguments in the built-in.  The function will now do the needed checks
for all of the arguments.

This fix is needed for the next patch in the series that fixes the
vec_replace_unaligned built-in.c test.

Please let me know if this patch is acceptable for mainline.  Thanks.

Carl 



rs6000, add argument to function find_instance

The function find_instance assumes it is called to check a built-in  with
only two arguments.  Ths patch extends the function by adding a parameter
specifying the number of buit-in arguments to check.

gcc/ChangeLog:
* config/rs6000/rs6000-c.cc (find_instance): Add new parameter that
specifies the number of built-in arguments to check.
(altivec_resolve_overloaded_builtin): Update calls to find_instance
to pass the number of built-in argument to be checked.
---
 gcc/config/rs6000/rs6000-c.cc | 27 +++
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
index a353bca19ef..350987b851b 100644
--- a/gcc/config/rs6000/rs6000-c.cc
+++ b/gcc/config/rs6000/rs6000-c.cc
@@ -1679,7 +1679,7 @@ tree
 find_instance (bool *unsupported_builtin, ovlddata **instance,
   rs6000_gen_builtins instance_code,
   rs6000_gen_builtins fcode,
-  tree *types, tree *args)
+  tree *types, tree *args, int nargs)
 {
   while (*instance && (*instance)->bifid != instance_code)
 *instance = (*instance)->next;
@@ -1691,17 +1691,28 @@ find_instance (bool *unsupported_builtin, ovlddata 
**instance,
   if (!inst->fntype)
 return error_mark_node;
   tree fntype = rs6000_builtin_info[inst->bifid].fntype;
-  tree parmtype0 = TREE_VALUE (TYPE_ARG_TYPES (fntype));
-  tree parmtype1 = TREE_VALUE (TREE_CHAIN (TYPE_ARG_TYPES (fntype)));
+  tree argtype = TYPE_ARG_TYPES (fntype);
+  tree parmtype;
+  int args_compatible = true;
 
-  if (rs6000_builtin_type_compatible (types[0], parmtype0)
-  && rs6000_builtin_type_compatible (types[1], parmtype1))
+  for (int i = 0; i bifid, false) != error_mark_node
  && rs6000_builtin_is_supported (inst->bifid))
{
  tree ret_type = TREE_TYPE (inst->fntype);
- return altivec_build_resolved_builtin (args, 2, fntype, ret_type,
+ return altivec_build_resolved_builtin (args, nargs, fntype, ret_type,
 inst->bifid, fcode);
}
   else
@@ -1921,7 +1932,7 @@ altivec_resolve_overloaded_builtin (location_t loc, tree 
fndecl,
  instance_code = RS6000_BIF_CMPB_32;
 
tree call = find_instance (_builtin, ,
-  instance_code, fcode, types, args);
+  instance_code, fcode, types, args, nargs);
if (call != error_mark_node)
  return call;
break;
@@ -1958,7 +1969,7 @@ altivec_resolve_overloaded_builtin (location_t loc, tree 
fndecl,
  }
 
tree call = find_instance (_builtin, ,
-  instance_code, fcode, types, args);
+  instance_code, fcode, types, args, nargs);
if (call != error_mark_node)
  return call;
break;
-- 
2.37.2

[PATCH 0/2] rs6000, fix vec_replace_unaligned built-in arguments

2023-07-17 Thread Carl Love via Gcc-patches



GCC maintianers:

In the process of fixing the powerpc/vec-replace-word-runnable.c test I
found there is an existing issue with function find_instance in rs6000-
c.cc.  Per the review comments from Kewen in

https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624401.html

The fix for function find_instance was put into a separate patch
followed by a patch for the vec-replace-word-runnable.c test fixes.

The two patches have been tested on Power 10 LE with no regression
failures.

   Carl

Re: [PATCH ver 3] rs6000, fix vec_replace_unaligned built-in arguments

2023-07-17 Thread Carl Love via Gcc-patches

On Thu, 2023-07-13 at 17:41 +0800, Kewen.Lin wrote:
> Hi Carl,
> 
> on 2023/7/8 04:18, Carl Love wrote:
> > GCC maintainers:
> > 
> > Version 3, added code to altivec_resolve_overloaded_builtin so the
> > correct instruction is selected for the size of the second
> > argument. 
> > This restores the instruction counts to the original values where
> > the
> > correct instructions were originally being generated.  The naming
> > of
> 
> Nice, I have some comments inlined below.
> 
> > the overloaded builtin instances and builtin definitions were
> > changed
> > to reflect the type of the second argument since the type of the
> > first
> > argument is now the same for all overloaded instances.  A new
> > builtin
> > test file was added for the case where the first argument is cast
> > to
> > the unsigned long long type.  This test requires the -flax-vector-
> > conversions gcc command line option.  Since the other tests do not
> > require this option, I felt that the new test needed to be in a
> > separate file.  Finally some formatting fixes were made in the
> > original
> > test file.  Patch has been retested on Power 10 with no
> > regressions.
> > 
> > Version 2, fixed various typos.  Updated the change log body to say
> > the
> > instruction counts were updated.  The instruction counts changed as
> > a
> > result of changing the first argument of the vec_replace_unaligned
> > builtin call from vector unsigned long long (vull) to vector
> > unsigned
> > char (vuc).  When the first argument was vull the builtin call
> > generated the vinsd instruction for the two test cases.  The
> > updated
> > call with vuc as the first argument generates two vinsw
> > instructions
> > instead.  Patch was retested on Power 10 with no regressions.
> > 
> > The following patch fixes the first argument in the builtin
> > definition
> > and the corresponding test cases.  Initially, the builtin
> > specification
> > was wrong due to a cut and past error.  The documentation was fixed
> > in:
> > 
> >commit ed3fea09b18f67e757b5768b42cb6e816626f1db
> >Author: Bill Schmidt 
> >Date:   Fri Feb 4 13:07:17 2022 -0600
> > 
> >rs6000: Correct function prototypes for
> > vec_replace_unaligned
> > 
> >Due to a pasto error in the documentation,
> > vec_replace_unaligned
> > was
> >implemented with the same function prototypes as
> > vec_replace_elt.  It was
> >intended that vec_replace_unaligned always specify output
> > vectors as having
> >type vector unsigned char, to emphasize that elements are
> > potentially
> >misaligned by this built-in function.  This patch corrects
> > the
> >misimplementation.
> > 
> > 
> > This patch fixes the arguments in the definitions and updates the
> > testcases accordingly.  Additionally, a few minor spacing issues
> > are
> > fixed.
> > 
> > The patch has been tested on Power 10 with no regressions.  Please
> > let
> > me know if the patch is acceptable for mainline.  Thanks.
> > 
> >  Carl 
> > 
> > --
> > rs6000, fix vec_replace_unaligned built-in arguments
> > 
> > The first argument of the vec_replace_unaligned built-in should
> > always be
> > unsigned char, as specified in gcc/doc/extend.texi.
> 
> Maybe "be with type vector unsigned char"?

Changed to 

  The first argument of the vec_replace_unaligned built-in should
always be of type unsigned char, 

> 
> > This patch fixes the builtin definitions and updates the test cases
> > to use
> > the correct arguments.  The original test file is renamed and a
> > second test
> > file is added for a new test case.
> > 
> > gcc/ChangeLog:
> > * config/rs6000/rs6000-builtins.def: Rename
> > __builtin_altivec_vreplace_un_uv2di as
> > __builtin_altivec_vreplace_un_udi
> > __builtin_altivec_vreplace_un_uv4si as
> > __builtin_altivec_vreplace_un_usi
> > __builtin_altivec_vreplace_un_v2df as
> > __builtin_altivec_vreplace_un_df
> > __builtin_altivec_vreplace_un_v2di as
> > __builtin_altivec_vreplace_un_di
> > __builtin_altivec_vreplace_un_v4sf as
> > __builtin_altivec_vreplace_un_sf
> > __builtin_altivec_vreplace_un_v4si as
> > __builtin_altivec_vreplace_un_si.
> > Rename VREPLACE_UN_UV2DI as VREPLACE_UN_UDI, VREPLACE_UN_UV4SI
> > as
> > VREPLACE_UN_USI, VREPLACE_UN_V2DF as VREPLACE_UN_DF,
> > VREPLACE_UN_V2DI as VREPLACE_UN_DI, VREPLACE_UN_V4SF as
> > VREPLACE_UN_SF, VREPLACE_UN_V4SI as VREPLACE_UN_SI.
> > Rename vreplace_un_v2di as vreplace_un_di, vreplace_un_v4si as
> > vreplace_un_si, vreplace_un_v2df as vreplace_un_df,
> > vreplace_un_v2di as vreplace_un_di, vreplace_un_v4sf as
> > vreplace_un_sf, vreplace_un_v4si as vreplace_un_si.
> > * config/rs6000/rs6000-c.cc (find_instance): Add new argument
> > nargs.  Add nargs check.  Extend function to handle three
> > arguments.
> > (altivec_resolve_overloaded_builtin): Add new argument nargs to
> > function calls.  Add

[PATCH ver4] rs6000, Add return value to __builtin_set_fpscr_rn

2023-07-11 Thread Carl Love via Gcc-patches

GCC maintainers:

Ver 4, Removed extra space in subject line.  Added comment to commit
log comments about new __SET_FPSCR_RN_RETURNS_FPSCR__ define.  Changed
Added to Add and Renamed to Rename in ChangeLog.  Updated define_expand
"rs6000_set_fpscr_rn" per Peter's comments to use new temporary
register for output value.  Also, comments from Kewen about moving rtx
tmp_di1 close to use.  Renamed tmp_di2 as orig_df_in_di.  Additionally,
changed the name of tmp_di3 to tmp_di2 so the numbering is
sequential.  Moved the new rtx tmp_di2 = gen_reg_rtx (DImode); right
before its use to be consistent with previous move request.  Fixed tabs
in comment.  Remove -std=c99 from test_fpscr_rn_builtin_1.c. Cleaned up
comment and removed abort from test_fpscr_rn_builtin_2.c.  

Fixed a couple of additional issues with the ChangeLog per feedback
from git gcc-verify.

Retested updated patch on Power 8, 9 and 10 to verify changes.

Ver 3, Renamed the patch per comments on ver 2.  Previous subject line
was " [PATCH ver 2] rs6000, __builtin_set_fpscr_rn add retrun value".  
Fixed spelling mistakes and formatting.  Updated define_expand
"rs6000_set_fpscr_rn to have the rs6000_get_fpscr_fields and
rs6000_update_fpscr_rn_field define expands inlined.  Optimized the
code and fixed use of temporary register values. Updated the test file
dg-do run arguments and dg-options.  Removed the check for
__SET_FPSCR_RN_RETURNS_FPSCR__. Removed additional references to the
overloaded built-in with double argument.  Fixed up the documentation
file.  Updated patch retested on Power 8 BE/LE, Power 9 BE/LE and Power
10 LE.

Ver 2,  Went back thru the requirements and emails.  Not sure where I
came up with the requirement for an overloaded version with double
argument.  Removed the overloaded version with the double argument. 
Added the macro to announce if the __builtin_set_fpscr_rn returns a
void or a double with the FPSCR bits.  Updated the documentation file. 
Retested on Power 8 BE/LE, Power 9 BE/LE, Power 10 LE.  Redid the test
file.  Per request, the original test file functionality was not
changed.  Just changed the name from test_fpscr_rn_builtin.c to 
test_fpscr_rn_builtin_1.c.  Put new tests for the return values into a
new test file, test_fpscr_rn_builtin_2.c.

The GLibC team requested a builtin to replace the mffscrn and
mffscrniinline asm instructions in the GLibC code.  Previously there
was discussion on adding builtins for the mffscrn instructions.

https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620261.html

In the end, it was felt that it would be to extend the existing
__builtin_set_fpscr_rn builtin to return a double instead of a void
type.  The desire is that we could have the functionality of the
mffscrn and mffscrni instructions on older ISAs.  The two instructions
were initially added in ISA 3.0.  The __builtin_set_fpscr_rn has the
needed functionality to set the RN field using the mffscrn and mffscrni
instructions if ISA 3.0 is supported or fall back to using logical
instructions to mask and set the bits for earlier ISAs.  The
instructions return the current value of the FPSCR fields DRN, VE, OE,
UE, ZE, XE, NI, RN bit positions then update the RN bit positions with
the new RN value provided.

The current __builtin_set_fpscr_rn builtin has a return type of void. 
So, changing the return type to double and returning the  FPSCR fields
DRN, VE, OE, UE, ZE, XE, NI, RN bit positions would then give the
functionally equivalent of the mffscrn and mffscrni instructions.  Any
current uses of the builtin would just ignore the return value yet any
new uses could use the return value.  So the requirement is for the
change to the __builtin_set_fpscr_rn builtin to be backwardly
compatible and work for all ISAs.

The following patch changes the return type of the
 __builtin_set_fpscr_rn builtin from void to double.  The return value
is the current value of the various FPSCR fields DRN, VE, OE, UE, ZE,
XE, NI, RN bit positions when the builtin is called.  The builtin then
updated the RN field with the new value provided as an argument to the
builtin.  The patch adds new testcases to test_fpscr_rn_builtin.c to
check that the builtin returns the current value of the FPSCR fields
and then updates the RN field.

The GLibC team has reviewed the patch to make sure it met their needs
as a drop in replacement for the inline asm mffscr and mffscrni
statements in the GLibC code.  T

The patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power 10
LE.

Please let me know if the patch is acceptable for mainline.  Thanks.

   Carl 

-
rs6000, Add return value to __builtin_set_fpscr_rn

Change the return value from void to double for __builtin_set_fpscr_rn.
The return value consists of the FPSCR fields DRN, VE, OE, UE, ZE, XE, NI,
RN bit positions.  A new test file, test powerpc/test_fpscr_rn_builtin_2.c,
is added to test the new return value for the built-in.

The value

Re: [PATCH ver3] rs6000, Add return value to __builtin_set_fpscr_rn

2023-07-11 Thread Carl Love via Gcc-patches

On Tue, 2023-07-11 at 13:54 +0800, Kewen.Lin wrote:
> Hi Carl,
> 
> Excepting for Peter's review comments, some nits are inline below.
> 
> on 2023/7/11 03:18, Carl Love wrote:
> > GCC maintainers:
> > 
> > 
> > 




> > -
> > rs6000, Add return value  to __builtin_set_fpscr_rn
> 
> Nit: One more unexpected space.

OK, removed

> 
> > Change the return value from void to double for
> > __builtin_set_fpscr_rn.
> > The return value consists of the FPSCR fields DRN, VE, OE, UE, ZE,
> > XE, NI,
> > RN bit positions.  A new test file, test
> > powerpc/test_fpscr_rn_builtin_2.c,
> > is added to test the new return value for the built-in.
> 
> Nit: It would be better to note the newly added
> __SET_FPSCR_RN_RETURNS_FPSCR__
> in commit log as well.

Added a comment as requested.

> 
> > gcc/ChangeLog:
> > * config/rs6000/rs6000-builtins.def (__builtin_set_fpscr_rn):
> > Update
> > built-in definition return type.
> > * config/rs6000-c.cc (rs6000_target_modify_macros): Add check,
> > define __SET_FPSCR_RN_RETURNS_FPSCR__ macro.
> > * config/rs6000/rs6000.md (rs6000_set_fpscr_rn): Added return
> 
> Nit: s/Added/Add/

Changed.

> 
> > argument to return FPSCR fields.
> > * doc/extend.texi (__builtin_set_fpscr_rn): Update description
> > for
> > the return value.  Add description for
> > __SET_FPSCR_RN_RETURNS_FPSCR__ macro.
> > 
> > gcc/testsuite/ChangeLog:
> > gcc.target/powerpc/test_fpscr_rn_builtin.c: Renamed to
> > test_fpscr_rn_builtin_1.c.  Added comment.
> 
> Nit: s/Added/Add/ and s/Renamed/Rename/.

Changed.

> 
> > gcc.target/powerpc/test_fpscr_rn_builtin_2.c: New test for the
> > return value of __builtin_set_fpscr_rn builtin.
> > ---
> > 



> > -  if (CONST_INT_P (operands[0]))
> > +  /* Emulate the behavior of the mffscrni, mffscrn instructions
> > for earlier
> > + ISAs.  Return bits 29:31 (DRN) and bits 56:63 (VE, OE, UE,
> > ZE, XE, NI,
> > + RN) from the FPSCR.  Set the RN field based on the value in
> > operands[1].
> > +  */
> > +
> > +  /* Get the current FPSCR fields, bits 29:31 (DRN) and bits 56:63
> > (VE, OE, UE,
> > +  ZE, XE, NI, RN) from the FPSCR and return them.  */
> > +  rtx tmp_di1 = gen_reg_rtx (DImode);
> 
> Nit: This line is preferred to be move to below (a), close to its
> use.

OK, moved the statement. 
> 
> > +
> > +  emit_insn (gen_rs6000_mffs (tmp_df));
> > +  rtx tmp_di2 = simplify_gen_subreg (DImode, tmp_df, DFmode, 0);
> 
> Nit: May be good to rename this tmp_di2 as orig_df_in_di, hope it can
> offer better readablity when people read the code below with its use.

OK, changed the name.  Then changed the name of tmp_di3 to tmp_di2 so
the numbering is sequential.  Moved the new rtx tmp_di2 = gen_reg_rtx
(DImode); right before its use.

> 
> ... (a)
> 
> > +  emit_insn (gen_anddi3 (tmp_di1, tmp_di2, GEN_INT
> > (0x000700FFULL)));
> > +  rtx tmp_rtn = simplify_gen_subreg (DFmode, tmp_di1, DImode, 0);
> > +  emit_move_insn (operands[0], tmp_rtn);
> > +
> > +  if (CONST_INT_P (operands[1]))
> >  {
> > -  if ((INTVAL (operands[0]) & 0x1) == 0x1)
> > +  if ((INTVAL (operands[1]) & 0x1) == 0x1)
> > emit_insn (gen_rs6000_mtfsb1 (GEN_INT (31)));
> >else
> > emit_insn (gen_rs6000_mtfsb0 (GEN_INT (31)));
> >  
> > -  if ((INTVAL (operands[0]) & 0x2) == 0x2)
> > +  if ((INTVAL (operands[1]) & 0x2) == 0x2)
> > emit_insn (gen_rs6000_mtfsb1 (GEN_INT (30)));
> >else
> > emit_insn (gen_rs6000_mtfsb0 (GEN_INT (30)));
> > @@ -6476,23 +6493,20 @@
> >else
> >  {
> >rtx tmp_rn = gen_reg_rtx (DImode);
> > -  rtx tmp_di = gen_reg_rtx (DImode);
> >  
> >/* Extract new RN mode from operand.  */
> > -  rtx op0 = convert_to_mode (DImode, operands[0], false);
> > -  emit_insn (gen_anddi3 (tmp_rn, op0, GEN_INT (3)));
> > +  rtx op1 = convert_to_mode (DImode, operands[1], false);
> > +  emit_insn (gen_anddi3 (tmp_rn, op1, GEN_INT (3)));
> >  
> > -  /* Insert new RN mode into FSCPR.  */
> > -  emit_insn (gen_rs6000_mffs (tmp_df));
> > -  tmp_di = simplify_gen_subreg (DImode, tmp_df, DFmode, 0);
> > -  emit_insn (gen_anddi3 (tmp_di, tmp_di, GEN_INT (-4)));
> > -  emit_insn (gen_iordi3 (tmp_di, tmp_di, tmp_rn));
> > +  /* Insert the new RN value from tmp_rn into FPSCR bit
> > [62:63].  */
> > +  emit_insn (gen_anddi3 (tmp_di1, tmp_di2, GEN_INT (-4)));
> > +  emit_insn (gen_iordi3 (tmp_di1, tmp_di1, tmp_rn));
> >  
> >/* Need to write to field k=15.  The fields are
> > [0:15].  Hence with
> > -L=0, W=0, FLM_i must be equal to 8, 16 = i + 8*(1-W).  FLM is
> > an
> > -8-bit field[0:7]. Need to set the bit that corresponds to the
> > -value of i that you want [0:7].  */
> > -  tmp_df = simplify_gen_subreg (DFmode, tmp_di, DImode, 0);
> > + L=0, W=0, FLM_i must be equal to 8, 16 = i + 8*(1-
> > W).  FLM is an
> > + 8-bit field[0:7].

Re: [PATCH ver3] rs6000, Add return value to __builtin_set_fpscr_rn

2023-07-10 Thread Carl Love via Gcc-patches

Peter:

On Mon, 2023-07-10 at 16:57 -0500, Peter Bergner wrote:
> On 7/10/23 2:18 PM, Carl Love wrote:
> > +  /* Get the current FPSCR fields, bits 29:31 (DRN) and bits 56:63
> > (VE, OE, UE,
> > +  ZE, XE, NI, RN) from the FPSCR and return them.  */
> 
> The 'Z' above should line up directly under the 'G' in Get.

Yup.  Fixed.

> 
> 
> > -  /* Insert new RN mode into FSCPR.  */
> > -  emit_insn (gen_rs6000_mffs (tmp_df));
> > -  tmp_di = simplify_gen_subreg (DImode, tmp_df, DFmode, 0);
> > -  emit_insn (gen_anddi3 (tmp_di, tmp_di, GEN_INT (-4)));
> > -  emit_insn (gen_iordi3 (tmp_di, tmp_di, tmp_rn));
> > +  /* Insert the new RN value from tmp_rn into FPSCR bit
> > [62:63].  */
> > +  emit_insn (gen_anddi3 (tmp_di1, tmp_di2, GEN_INT (-4)));
> > +  emit_insn (gen_iordi3 (tmp_di1, tmp_di1, tmp_rn));
> 
> This is an expander, so you shouldn't reuse temporaries as multiple
> destination pseudos, since that limits the register allocator's
> freedom.
> I know the old code did it, but since you're changing the line, you
> might as well use a new temp.

OK, wasn't aware that reusing temps was an issue for the register
allocator.  Thanks for letting me know.  So, I think you want something
like:

  rtx tmp_rn = gen_reg_rtx (DImode);
  rtx tmp_di3 = gen_reg_rtx (DImode);

  /* Extract new RN mode from operand.  */
  rtx op1 = convert_to_mode (DImode, operands[1], false);
  emit_insn (gen_anddi3 (tmp_rn, op1, GEN_INT (3)));

  /* Insert the new RN value from tmp_rn into FPSCR bit [62:63].  */
  emit_insn (gen_anddi3 (tmp_di1, tmp_di2, GEN_INT (-4)));
  emit_insn (gen_iordi3 (tmp_di3, tmp_di1, tmp_rn));

  /* Need to write to field k=15.  The fields are [0:15].  Hence with
 L=0, W=0, FLM_i must be equal to 8, 16 = i + 8*(1-W).  FLM is an
 8-bit field[0:7]. Need to set the bit that corresponds to the
 value of i that you want [0:7].  */
  tmp_df = simplify_gen_subreg (DFmode, tmp_di3, DImode, 0);

where each destination is a unique register.  Then let the register
allocator can decide if it wants to use the same register or not at
code generation time.

I made the change and did a quick check compiling on Power 10 with
mcpu=power[8,9,10] and it worked fine. I will run the full regression
on each of the processor types just to be sure.

  Carl

[PATCH ver3] rs6000, Add return value to __builtin_set_fpscr_rn

2023-07-10 Thread Carl Love via Gcc-patches



GCC maintainers:

Ver 3, Renamed the patch per comments on ver 2.  Previous subject line
was " [PATCH ver 2] rs6000, __builtin_set_fpscr_rn add retrun value".  
Fixed spelling mistakes and formatting.  Updated define_expand
"rs6000_set_fpscr_rn to have the rs6000_get_fpscr_fields and
rs6000_update_fpscr_rn_field define expands inlined.  Optimized the
code and fixed use of temporary register values. Updated the test file
dg-do run arguments and dg-options.  Removed the check for
__SET_FPSCR_RN_RETURNS_FPSCR__. Removed additional references to the
overloaded built-in with double argument.  Fixed up the documentation
file.  Updated patch retested on Power 8 BE/LE, Power 9 BE/LE and Power
10 LE.

Ver 2,  Went back thru the requirements and emails.  Not sure where I
came up with the requirement for an overloaded version with double
argument.  Removed the overloaded version with the double argument. 
Added the macro to announce if the __builtin_set_fpscr_rn returns a
void or a double with the FPSCR bits.  Updated the documentation file. 
Retested on Power 8 BE/LE, Power 9 BE/LE, Power 10 LE.  Redid the test
file.  Per request, the original test file functionality was not
changed.  Just changed the name from test_fpscr_rn_builtin.c to 
test_fpscr_rn_builtin_1.c.  Put new tests for the return values into a
new test file, test_fpscr_rn_builtin_2.c.

The GLibC team requested a builtin to replace the mffscrn and
mffscrniinline asm instructions in the GLibC code.  Previously there
was discussion on adding builtins for the mffscrn instructions.

https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620261.html

In the end, it was felt that it would be to extend the existing
__builtin_set_fpscr_rn builtin to return a double instead of a void
type.  The desire is that we could have the functionality of the
mffscrn and mffscrni instructions on older ISAs.  The two instructions
were initially added in ISA 3.0.  The __builtin_set_fpscr_rn has the
needed functionality to set the RN field using the mffscrn and mffscrni
instructions if ISA 3.0 is supported or fall back to using logical
instructions to mask and set the bits for earlier ISAs.  The
instructions return the current value of the FPSCR fields DRN, VE, OE,
UE, ZE, XE, NI, RN bit positions then update the RN bit positions with
the new RN value provided.

The current __builtin_set_fpscr_rn builtin has a return type of void. 
So, changing the return type to double and returning the  FPSCR fields
DRN, VE, OE, UE, ZE, XE, NI, RN bit positions would then give the
functionally equivalent of the mffscrn and mffscrni instructions.  Any
current uses of the builtin would just ignore the return value yet any
new uses could use the return value.  So the requirement is for the
change to the __builtin_set_fpscr_rn builtin to be backwardly
compatible and work for all ISAs.

The following patch changes the return type of the
 __builtin_set_fpscr_rn builtin from void to double.  The return value
is the current value of the various FPSCR fields DRN, VE, OE, UE, ZE,
XE, NI, RN bit positions when the builtin is called.  The builtin then
updated the RN field with the new value provided as an argument to the
builtin.  The patch adds new testcases to test_fpscr_rn_builtin.c to
check that the builtin returns the current value of the FPSCR fields
and then updates the RN field.

The GLibC team has reviewed the patch to make sure it met their needs
as a drop in replacement for the inline asm mffscr and mffscrni
statements in the GLibC code.  T

The patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power 10
LE.

Please let me know if the patch is acceptable for mainline.  Thanks.

   Carl 



-
rs6000, Add return value  to __builtin_set_fpscr_rn

Change the return value from void to double for __builtin_set_fpscr_rn.
The return value consists of the FPSCR fields DRN, VE, OE, UE, ZE, XE, NI,
RN bit positions.  A new test file, test powerpc/test_fpscr_rn_builtin_2.c,
is added to test the new return value for the built-in.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_set_fpscr_rn): Update
built-in definition return type.
* config/rs6000-c.cc (rs6000_target_modify_macros): Add check,
define __SET_FPSCR_RN_RETURNS_FPSCR__ macro.
* config/rs6000/rs6000.md (rs6000_set_fpscr_rn): Added return
argument to return FPSCR fields.
* doc/extend.texi (__builtin_set_fpscr_rn): Update description for
the return value.  Add description for
__SET_FPSCR_RN_RETURNS_FPSCR__ macro.

gcc/testsuite/ChangeLog:
gcc.target/powerpc/test_fpscr_rn_builtin.c: Renamed to
test_fpscr_rn_builtin_1.c.  Added comment.
gcc.target/powerpc/test_fpscr_rn_builtin_2.c: New test for the
return value of __builtin_set_fpscr_rn builtin.
---
 gcc/config/rs6000/rs6000-builtins.def |   2 +-
 gcc/config/rs6000/rs6000-c.cc

Re: [PATCH ver 2] rs6000, __builtin_set_fpscr_rn add retrun value

2023-07-10 Thread Carl Love via Gcc-patches

On Fri, 2023-07-07 at 12:06 +0800, Kewen.Lin wrote:
> Hi Carl,
> 
> Some more minor comments are inline below on top of Peter's
> insightful
> review comments.
> 
> on 2023/7/1 08:58, Carl Love wrote:
> > GCC maintainers:
> > 
> > Ver 2,  Went back thru the requirements and emails.  Not sure where
> > I
> > came up with the requirement for an overloaded version with double
> > argument.  Removed the overloaded version with the double
> > argument. 
> > Added the macro to announce if the __builtin_set_fpscr_rn returns a
> > void or a double with the FPSCR bits.  Updated the documentation
> > file. 
> > Retested on Power 8 BE/LE, Power 9 BE/LE, Power 10 LE.  Redid the
> > test
> > file.  Per request, the original test file functionality was not
> > changed.  Just changed the name from test_fpscr_rn_builtin.c to 
> > test_fpscr_rn_builtin_1.c.  Put new tests for the return values
> > into a
> > new test file, test_fpscr_rn_builtin_2.c.
> > 
> > The GLibC team requested a builtin to replace the mffscrn and
> > mffscrniinline asm instructions in the GLibC code.  Previously
> > there
> > was discussion on adding builtins for the mffscrn instructions.
> > 
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620261.html
> > 
> > In the end, it was felt that it would be to extend the existing
> > __builtin_set_fpscr_rn builtin to return a double instead of a void
> > type.  The desire is that we could have the functionality of the
> > mffscrn and mffscrni instructions on older ISAs.  The two
> > instructions
> > were initially added in ISA 3.0.  The __builtin_set_fpscr_rn has
> > the
> > needed functionality to set the RN field using the mffscrn and
> > mffscrni
> > instructions if ISA 3.0 is supported or fall back to using logical
> > instructions to mask and set the bits for earlier ISAs.  The
> > instructions return the current value of the FPSCR fields DRN, VE,
> > OE,
> > UE, ZE, XE, NI, RN bit positions then update the RN bit positions
> > with
> > the new RN value provided.
> > 
> > The current __builtin_set_fpscr_rn builtin has a return type of
> > void. 
> > So, changing the return type to double and returning the  FPSCR
> > fields
> > DRN, VE, OE, UE, ZE, XE, NI, RN bit positions would then give the
> > functionally equivalent of the mffscrn and mffscrni
> > instructions.  Any
> > current uses of the builtin would just ignore the return value yet
> > any
> > new uses could use the return value.  So the requirement is for the
> > change to the __builtin_set_fpscr_rn builtin to be backwardly
> > compatible and work for all ISAs.
> > 
> > The following patch changes the return type of the
> >  __builtin_set_fpscr_rn builtin from void to double.  The return
> > value
> > is the current value of the various FPSCR fields DRN, VE, OE, UE,
> > ZE,
> > XE, NI, RN bit positions when the builtin is called.  The builtin
> > then
> > updated the RN field with the new value provided as an argument to
> > the
> > builtin.  The patch adds new testcases to test_fpscr_rn_builtin.c
> > to
> > check that the builtin returns the current value of the FPSCR
> > fields
> > and then updates the RN field.
> > 
> > The GLibC team has reviewed the patch to make sure it met their
> > needs
> > as a drop in replacement for the inline asm mffscr and mffscrni
> > statements in the GLibC code.  T
> > 
> > The patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power
> > 10
> > LE.
> > 
> > Please let me know if the patch is acceptable for
> > mainline.  Thanks.
> > 
> >Carl 
> > 
> > 
> > --
> > rs6000, __builtin_set_fpscr_rn add retrun value
> > 
> > Change the return value from void to double.  The return value
> > consists of
> > the FPSCR fields DRN, VE, OE, UE, ZE, XE, NI, RN bit
> > positions.  Add an
> > overloaded version which accepts a double argument.
> > 
> > The test powerpc/test_fpscr_rn_builtin.c is updated to add tests
> > for the
> > double reterun value and the new double argument.
> > 
> > gcc/ChangeLog:
> > * config/rs6000/rs6000-builtins.def (__builtin_set_fpscr_rn):
> > Update
> > builtin definition return type.
> > * config/rs6000-c.cc(rs6000_target_modify_macros): Add check,
> > define
> > __SET_FPSCR_RN_RETURNS_FPSCR__ macro.
> > * config/rs6000/rs6000.md ((rs6000_get_fpscr_fields): New
> > define_expand.
> > (rs6000_update_fpscr_rn_field): New define_expand.
> > (rs6000_set_fpscr_rn): Addedreturn argument.  Updated
> > to use new
> > rs6000_get_fpscr_fields and rs6000_update_fpscr_rn_field define
> >  _expands.
> > * doc/extend.texi (__builtin_set_fpscr_rn): Update description
> > for
> > the return value and new double argument.  Add descripton for
> > __SET_FPSCR_RN_RETURNS_FPSCR__ macro.
> > 
> > gcc/testsuite/ChangeLog:
> > gcc.target/powerpc/test_fpscr_rn_builtin.c: Renamed to
> > test_fpscr_rn_builtin_1.c.  Added comment.
> >

Re: [PATCH ver 2] rs6000, __builtin_set_fpscr_rn add retrun value

2023-07-10 Thread Carl Love via Gcc-patches

On Thu, 2023-07-06 at 17:54 -0500, Peter Bergner wrote:
> On 6/30/23 7:58 PM, Carl Love via Gcc-patches wrote:
> > rs6000, __builtin_set_fpscr_rn add retrun value
> 
> s/retrun/return/
> 
> Maybe better written as:
> 
> rs6000: Add return value to __builtin_set_fpscr_rn

Changed subject, fixed misspelling.
> 
> 
> > Change the return value from void to double.  The return value
> > consists of
> > the FPSCR fields DRN, VE, OE, UE, ZE, XE, NI, RN bit
> > positions.  Add an
> > overloaded version which accepts a double argument.
> 
> You're not adding an overloaded version anymore, so I think you can
> just
> remove the last sentence.

Yup, didn't get that removed when removing the overloaded instance. 
fixed.

> 
> 
> 
> > The test powerpc/test_fpscr_rn_builtin.c is updated to add tests
> > for the
> > double reterun value and the new double argument.
> 
> s/reterun/return/   ...and there is no double argument anymore, so
> that
> part can be removed.

Fixed.  Note, the new return value tests were moved to new test file.
> 
> 
> 
> > * config/rs6000/rs6000.md ((rs6000_get_fpscr_fields): New
> > define_expand.
> 
> Too many '('.

fixed.

> 
> 
> 
> > (rs6000_set_fpscr_rn): Addedreturn argument.  Updated
> > to use new
> 
> Looks like a  after Added instead of a space.
> 
> 
> > rs6000_get_fpscr_fields and rs6000_update_fpscr_rn_field define
> >  _expands.
> 
> Don't split define_expand across two lines.

Fixed.

> 
> 
> 
> > * doc/extend.texi (__builtin_set_fpscr_rn): Update description
> > for
> > the return value and new double argument.  Add descripton for
> > __SET_FPSCR_RN_RETURNS_FPSCR__ macro.
> 
> s/descripton/description/

Fixed.

> 
> 
> 
> 
> 
> 
> > +  /* Tell the user the __builtin_set_fpscr_rn now returns the
> > FPSCR fields
> > + in a double.  Originally the builtin returned void.  */
> 
> Either:
>   1) s/Tell the user the __builtin_set_fpscr_rn/Tell the user
> __builtin_set_fpscr_rn/ 
>   2) s/the __builtin_set_fpscr_rn now/the __builtin_set_fpscr_rn
> built-in now/ 
> 
> 
> > +  if ((flags & OPTION_MASK_SOFT_FLOAT) == 0)
> > +  rs6000_define_or_undefine_macro (define_p,
> > "__SET_FPSCR_RN_RETURNS_FPSCR__");
> 
> This doesn't look like it's indented correctly.
> 
> 

Fixed indentation.

> 
> 
> > +(define_expand "rs6000_get_fpscr_fields"
> > + [(match_operand:DF 0 "gpc_reg_operand")]
> > +  "TARGET_HARD_FLOAT"
> > +{
> > +  /* Extract fields bits 29:31 (DRN) and bits 56:63 (VE, OE, UE,
> > ZE, XE, NI,
> > + RN) from the FPSCR and return them.  */
> > +  rtx tmp_df = gen_reg_rtx (DFmode);
> > +  rtx tmp_di = gen_reg_rtx (DImode);
> > +
> > +  emit_insn (gen_rs6000_mffs (tmp_df));
> > +  tmp_di = simplify_gen_subreg (DImode, tmp_df, DFmode, 0);
> > +  emit_insn (gen_anddi3 (tmp_di, tmp_di, GEN_INT
> > (0x000700FFULL)));
> > +  rtx tmp_rtn = simplify_gen_subreg (DFmode, tmp_di, DImode, 0);
> > +  emit_move_insn (operands[0], tmp_rtn);
> > +  DONE;
> > +})
> 
> This doesn't look correct.  You first set tmp_di to a new reg rtx but
> then
> throw that away with the return value of simplify_gen_subreg().  I'm
> guessing
> you want that tmp_di as a gen_reg_rtx for the destination of the
> gen_anddi3, so
> you probably want a different rtx for the subreg that feeds the
> gen_anddi3.

OK, fixed the use of the tmp values.  Note the define_expand was
inlined into define_expand "rs6000_set_fpscr_rn per comments from
Kewen.  Inlining allows the reuse some of the tmp values.

> 
> 
> 
> > +(define_expand "rs6000_update_fpscr_rn_field"
> > + [(match_operand:DI 0 "gpc_reg_operand")]
> > +  "TARGET_HARD_FLOAT"
> > +{
> > +  /* Insert the new RN value from operands[0] into FPSCR bit
> > [62:63].  */
> > +  rtx tmp_di = gen_reg_rtx (DImode);
> > +  rtx tmp_df = gen_reg_rtx (DFmode);
> > +
> > +  emit_insn (gen_rs6000_mffs (tmp_df));
> > +  tmp_di = simplify_gen_subreg (DImode, tmp_df, DFmode, 0);
> 
> Ditto.

Fixed.

> 
> 
> 
> 
> > +The @code{__builtin_set_fpscr_rn} builtin allows changing both of
> > the floating
> > +point rounding mode bits and returning the various FPSCR fields
> > before the RN
> > +field is updated.  The builtin returns a double consisting of the
> > initial value
> > +of the FPSCR fields DRN, VE, OE, UE, ZE, XE, NI, and RN bit
> > positions with all
> > +oth

[PATCH v5] rs6000: Update the vsx-vector-6.* tests.

2023-07-07 Thread Carl Love via Gcc-patches



GCC maintainers:

Ver 5. Removed -compile from the names of the compile only tests. Fixed
up the reference to the compile file names in the .h file headers. 
Replaced powerpc_vsx_ok with vsx_hw in the run test files.  Removed the
-save-temps from all files.  Retested on all of the various platforms
with no regressions.

Ver 4. Fixed a few typos.  Redid the tests to create separate run and
compile tests.

Ver 3.  Added __attribute__ ((noipa)) to the test files.  Changed some
of the scan-assembler-times checks to cover multiple similar
instructions.  Change the function check macro to a macro to generate a
function to do the test and check the results.  Retested on the various
processor types and BE/LE versions.

Ver 2.  Switched to using code macros to generate the call to the
builtin and test the results.  Added in instruction counts for the key
instruction for the builtin.  Moved the tests into an additional
function call to ensure the compile doesn't replace the builtin call
code with the statically computed results.  The compiler was doing this
for a few of the simpler tests.  

The following patch takes the tests in vsx-vector-6-p7.h,  vsx-vector-
6-p8.h, vsx-vector-6-p9.h and reorganizes them into a series of smaller
test files by functionality rather than processor version.

Tested the patch on Power 8 LE/BE, Power 9 LE/BE and Power 10 LE with
no regresions.

Please let me know if this patch is acceptable for mainline.  Thanks.

   Carl



-
rs6000: Update the vsx-vector-6.* tests.

The vsx-vector-6.h file is included into the processor specific test files
vsx-vector-6.p7.c, vsx-vector-6.p8.c, and vsx-vector-6.p9.c.  The .h file
contains a large number of vsx vector built-in tests.  The processor
specific files contain the number of instructions that the tests are
expected to generate for that processor.  The tests are compile only.

This patch reworks the tests into a series of files for related tests.
The new tests consist of a runnable test to verify the built-in argument
types and the functional correctness of each built-in.  There is also a
compile only test that verifies the built-ins generate the expected number
of instructions for the various built-in tests.

gcc/testsuite/
* gcc.target/powerpc/vsx-vector-6-func-1op.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-1op-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-1op.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2lop.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2lop-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2lop.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2op.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2op-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2op.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-3op.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-3op-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-3op.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-cmp-all.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-cmp-all-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-cmp-all.c: New test
file.
* gcc.target/powerpc/vsx-vector-6-func-cmp.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-cmp-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-cmp.c: New test file.
* gcc.target/powerpc/vsx-vector-6.h: Remove test file.
* gcc.target/powerpc/vsx-vector-6.p7.c: Remove test file.
* gcc.target/powerpc/vsx-vector-6.p8.c: Remove test file.
* gcc.target/powerpc/vsx-vector-6.p9.c: Remove test file.
---
 .../powerpc/vsx-vector-6-func-1op-run.c   |  98 
 .../powerpc/vsx-vector-6-func-1op.c   |  22 ++
 .../powerpc/vsx-vector-6-func-1op.h   |  43 
 .../powerpc/vsx-vector-6-func-2lop-run.c  | 177 ++
 .../powerpc/vsx-vector-6-func-2lop.c  |  14 ++
 .../powerpc/vsx-vector-6-func-2lop.h  |  47 
 .../powerpc/vsx-vector-6-func-2op-run.c   |  96 
 .../powerpc/vsx-vector-6-func-2op.c   |  21 ++
 .../powerpc/vsx-vector-6-func-2op.h   |  42 
 .../powerpc/vsx-vector-6-func-3op-run.c   | 229 ++
 .../powerpc/vsx-vector-6-func-3op.c   |  17 ++
 .../powerpc/vsx-vector-6-func-3op.h   |  73 ++
 .../powerpc/vsx-vector-6-func-cmp-all-run.c   | 147 +++
 .../powerpc/vsx-vector-6-func-cmp-all.c   |  17 ++
 .../powerpc/vsx-vector-6-func-cmp-all.h   |  76 ++
 .../powerpc/vsx-vector-6-func-cmp-run.c   |  92 +++
 .../powerpc/vsx-vector-6-func-cmp.c   |  16 ++
 .../powerpc/vsx-vector-6-func-cmp.h   |  40 +++
 .../gcc.target/powerpc/vsx-vector-6.h | 154

Re: [PATCH v4] rs6000: Update the vsx-vector-6.* tests.

2023-07-07 Thread Carl Love via Gcc-patches

On Fri, 2023-07-07 at 10:15 +0800, Kewen.Lin wrote:



> 
> > diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-
> > 1op-compile.c b/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-
> > 1op-compile.c
> > new file mode 100644
> > index 000..6b7d73ed66c
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-1op-
> > compile.c
> 
> Nit: Maybe remove "-compile" from the name as when there is "-run"
> variant people
> are easy to realize this is for compilation, the name without "-
> compile" seems
> more neat.  With this name change, you have to update the comment
> referring it in
> its related header file accordingly.  ("sed -i 's/-compile//g' vsx-
> vector-6-func-*.h"
> recommended, similar patterns could be used for the two other
> comments below.)

Changed the compile only file names as requested.  Updated the file
names in the .h files.  Updated the Change Log file names.
 
> 
> > @@ -0,0 +1,22 @@
> > +/* { dg-do compile { target lp64 } } */
> > +/* { dg-require-effective-target powerpc_vsx_ok } */
> > +/* { dg-options "-O2 -save-temps -mvsx" } */
> 
> Nit: We don't need "-save-temps" any more for all the test cases in
> this patch.
> 
Yup, -save-temps is on automatically for compile only and we are not
checking instructions in the run file.  Removed all of the -save-temp
directives.

> > +
> > +/* This file just generates calls to the various builtins and
> > verifies the
> > +   expected number of instructions for each builtin were
> > generated.  */
> > +
> > +#include "vsx-vector-6-func-1op.h"
> > +
> > +/* { dg-final { scan-assembler-times {\mxvabssp\M} 1 } } */
> > +/* { dg-final { scan-assembler-times {\mxvrspip\M} 1 } } */
> > +/* { dg-final { scan-assembler-times {\mxvrspim\M} 1 } } */
> > +/* { dg-final { scan-assembler-times {\mxvrspi\M} 1 } } */ 
> > +/* { dg-final { scan-assembler-times {\mxvrspic\M} 1 } } */
> > +/* { dg-final { scan-assembler-times {\mxvrspiz\M} 1 } } */
> > +/* { dg-final { scan-assembler-times {\mxvabsdp\M} 1 } } */
> > +/* { dg-final { scan-assembler-times {\mxvrdpip\M} 1 } } */
> > +/* { dg-final { scan-assembler-times {\mxvrdpim\M} 1 } } */
> > +/* { dg-final { scan-assembler-times {\mxvrdpi\M} 1 } } */
> > +/* { dg-final { scan-assembler-times {\mxvrdpic\M} 1 } } */
> > +/* { dg-final { scan-assembler-times {\mxvrdpiz\M} 1 } } */
> > +/* { dg-final { scan-assembler-times {\mxvsqrtdp\M} 1 } } */
> > diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-
> > 1op-run.c b/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-1op-
> > run.c
> > new file mode 100644
> > index 000..150e372e428
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-1op-run.c
> > @@ -0,0 +1,98 @@
> > +/* { dg-do run { target lp64 } } */
> > +/* { dg-require-effective-target powerpc_vsx_ok } */
> 
> We need vsx_hw for those *-run.c cases instead, as powerpc_vsx_ok
> doesn't guarantee the test env can support vsx instructions, it just
> ensures it can be compiled.
> 
> /* { dg-require-effective-target vsx_hw } */
> 
> All "*-run.c" cases need changes.

Updated the run cases to use vsx_hw, removed powerpc_vsx_ok.

 Carl

[PATCH ver 3] rs6000, fix vec_replace_unaligned built-in arguments

2023-07-07 Thread Carl Love via Gcc-patches



GCC maintainers:

Version 3, added code to altivec_resolve_overloaded_builtin so the
correct instruction is selected for the size of the second argument. 
This restores the instruction counts to the original values where the
correct instructions were originally being generated.  The naming of
the overloaded builtin instances and builtin definitions were changed
to reflect the type of the second argument since the type of the first
argument is now the same for all overloaded instances.  A new builtin
test file was added for the case where the first argument is cast to
the unsigned long long type.  This test requires the -flax-vector-
conversions gcc command line option.  Since the other tests do not
require this option, I felt that the new test needed to be in a
separate file.  Finally some formatting fixes were made in the original
test file.  Patch has been retested on Power 10 with no regressions.

Version 2, fixed various typos.  Updated the change log body to say the
instruction counts were updated.  The instruction counts changed as a
result of changing the first argument of the vec_replace_unaligned
builtin call from vector unsigned long long (vull) to vector unsigned
char (vuc).  When the first argument was vull the builtin call
generated the vinsd instruction for the two test cases.  The updated
call with vuc as the first argument generates two vinsw instructions
instead.  Patch was retested on Power 10 with no regressions.

The following patch fixes the first argument in the builtin definition
and the corresponding test cases.  Initially, the builtin specification
was wrong due to a cut and past error.  The documentation was fixed in:

   commit ed3fea09b18f67e757b5768b42cb6e816626f1db
   Author: Bill Schmidt 
   Date:   Fri Feb 4 13:07:17 2022 -0600

   rs6000: Correct function prototypes for vec_replace_unaligned

   Due to a pasto error in the documentation, vec_replace_unaligned
was
   implemented with the same function prototypes as
vec_replace_elt.  It was
   intended that vec_replace_unaligned always specify output
vectors as having
   type vector unsigned char, to emphasize that elements are
potentially
   misaligned by this built-in function.  This patch corrects the
   misimplementation.


This patch fixes the arguments in the definitions and updates the
testcases accordingly.  Additionally, a few minor spacing issues are
fixed.

The patch has been tested on Power 10 with no regressions.  Please let
me know if the patch is acceptable for mainline.  Thanks.

 Carl 

--
rs6000, fix vec_replace_unaligned built-in arguments

The first argument of the vec_replace_unaligned built-in should always be
unsigned char, as specified in gcc/doc/extend.texi.

This patch fixes the builtin definitions and updates the test cases to use
the correct arguments.  The original test file is renamed and a second test
file is added for a new test case.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def: Rename
__builtin_altivec_vreplace_un_uv2di as __builtin_altivec_vreplace_un_udi
__builtin_altivec_vreplace_un_uv4si as __builtin_altivec_vreplace_un_usi
__builtin_altivec_vreplace_un_v2df as __builtin_altivec_vreplace_un_df
__builtin_altivec_vreplace_un_v2di as __builtin_altivec_vreplace_un_di
__builtin_altivec_vreplace_un_v4sf as __builtin_altivec_vreplace_un_sf
__builtin_altivec_vreplace_un_v4si as __builtin_altivec_vreplace_un_si.
Rename VREPLACE_UN_UV2DI as VREPLACE_UN_UDI, VREPLACE_UN_UV4SI as
VREPLACE_UN_USI, VREPLACE_UN_V2DF as VREPLACE_UN_DF,
VREPLACE_UN_V2DI as VREPLACE_UN_DI, VREPLACE_UN_V4SF as
VREPLACE_UN_SF, VREPLACE_UN_V4SI as VREPLACE_UN_SI.
Rename vreplace_un_v2di as vreplace_un_di, vreplace_un_v4si as
vreplace_un_si, vreplace_un_v2df as vreplace_un_df,
vreplace_un_v2di as vreplace_un_di, vreplace_un_v4sf as
vreplace_un_sf, vreplace_un_v4si as vreplace_un_si.
* config/rs6000/rs6000-c.cc (find_instance): Add new argument
nargs.  Add nargs check.  Extend function to handle three arguments.
(altivec_resolve_overloaded_builtin): Add new argument nargs to
function calls.  Add case RS6000_OVLD_VEC_REPLACE_UN.
* config/rs6000/rs6000-overload.def (__builtin_vec_replace_un):
Fix first argument type.  Rename VREPLACE_UN_UV4SI as
VREPLACE_UN_USI, VREPLACE_UN_V4SI as VREPLACE_UN_SI,
VREPLACE_UN_UV2DI as VREPLACE_UN_UDI, VREPLACE_UN_V2DI as
VREPLACE_UN_DI, VREPLACE_UN_V4SF as VREPLACE_UN_SF,
VREPLACE_UN_V2DF as VREPLACE_UN_DF.
* config/rs6000/vsx.md (VEC_RU): New mode iterator.
(VEC_RU_char): New mode attribute.
(vreplace_un_): Change iterator and mode attribute.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/vec-replace-word-runnable.c: Renamed
vec-replace-word-runnable_1.c.
*

Re: [PATCH] rs6000, fix vec_replace_unaligned builtin arguments

2023-07-07 Thread Carl Love via Gcc-patches

Kewen:

On Mon, 2023-06-19 at 11:50 +0800, Kewen.Lin wrote:
> > generated the vinsd instruction for the two calls with the first
> > argument of unsigned long long int.  When the first argument of the
> > builtin is changed to the correct type, vector unsigned char the
> > builtin generates the vinsw instruction instead.  The change occurs
> > in
> > two places resulting in reducing the counts for vinsd by two and
> > increasing the counts for vinsw by two.  The other calls to the
> > builtin
> > are either vector ints or vector floats which generate the vinsw
> > instruction.  Changing the first argument in those calls to vector
> > unsigned char still generate the vinsw instruction.
> 
> But it did expose something odd and needed to be handled in this
> change.
> I had a further check, for the below test case:
> 
> #include "altivec.h"
> 
> #ifdef ORIG
> vector unsigned char foo (vector unsigned long long v){
>   unsigned long long val = 678ull;
>   return vec_replace_unaligned (v, val, 7);
> }
> #else
> vector unsigned char foo (vector unsigned long long v){
>   unsigned long long val = 678ull;
>   return vec_replace_unaligned ((vector unsigned char)v, val, 7);
> }
> #endif
> 
> Without this patch (-DORIG required to match the previous prototype),
> it would generate vinsd; while with this proposed patch, it would
> generate vinsw.  I think it's unexpected since users can still have
> the need to replace a doubleword size of chunk but give a constant
> which can be represented by int.  The previous way can support it,
> while the new way can't.  So we should have some way to distinguish
> it, we have some special-casing in function
> altivec_resolve_overloaded_builtin, could you have a check and try
> there?  Thanks!

I added the needed handling in altivec_resolve_overloaded_builtin to
address the issue with the built-in generating the correct instruction
for the unsigned long long cases in the test file.  I added an
additional test file with the above test case.  It was put into a new
test file as it requires the -flax-vector-conversions argument.  I felt
that it was best to separate the tests that need/do not need the -flax-
vector-conversions argument.

Note, adding the additional case statement RS6000_OVLD_VEC_REPLACE_UN
to handle the three argument built-in vec_replace_unaligned in
altivec_resolve_overloaded_builtin exposed an issue with function
find_instance.  Function find_instance assumes there are only two
arguments in the builtin.  There are no checks on the actual number of
arguments used by the built-in. This leads to an error in
tree_operand_check_failed() when using find_builtin.  The find_buitin
function was extended to handle 2 or 3 arguments with a check to make
sure the number of arguments is either 2 or 3.

FYI, I also noticed in the current patch the names in rs6000-
builtins.def and rs6000-overload.def for builtin_altivec_vreplace_un
still reflect the type of the first argument.  The current patch
changes the first argument to vuc, but the naming didn't all get
updated.  I think the names should be changed to reflect the name of
the second argument since the first arguments are all identical.  For
example:

-- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -3388,29 +3388,29 @@
   const vull __builtin_altivec_vpextd (vull, vull);
 VPEXTD vpextd {}

   -  const vuc __builtin_altivec_vreplace_un_uv2di (vull, unsigned long long, \
   - const int<4>);
   -VREPLACE_UN_UV2DI vreplace_un_v2di {}
   +  const vuc __builtin_altivec_vreplace_un_udi (vuc, unsigned long long, \
   +   const int<4>);
   +VREPLACE_UN_UDI vreplace_un_di {}

 The name changes will ripple thru files rs6000-builtins.def, rs6000-
 overload.def and vsx.md.

I did all the naming as well in the new version 3 of the patch.

 Carl

[PATCH v4] rs6000: Update the vsx-vector-6.* tests.

2023-07-06 Thread Carl Love via Gcc-patches

GCC maintainers:

Ver 4. Fixed a few typos.  Redid the tests to create separate run and
compile tests.

Ver 3.  Added __attribute__ ((noipa)) to the test files.  Changed some
of the scan-assembler-times checks to cover multiple similar
instructions.  Change the function check macro to a macro to generate a
function to do the test and check the results.  Retested on the various
processor types and BE/LE versions.

Ver 2.  Switched to using code macros to generate the call to the
builtin and test the results.  Added in instruction counts for the key
instruction for the builtin.  Moved the tests into an additional
function call to ensure the compile doesn't replace the builtin call
code with the statically computed results.  The compiler was doing this
for a few of the simpler tests.  

The following patch takes the tests in vsx-vector-6-p7.h,  vsx-vector-
6-p8.h, vsx-vector-6-p9.h and reorganizes them into a series of smaller
test files by functionality rather than processor version.

Tested the patch on Power 8 LE/BE, Power 9 LE/BE and Power 10 LE with
no regresions.

Please let me know if this patch is acceptable for mainline.  Thanks.

   Carl



-
rs6000: Update the vsx-vector-6.* tests.

The vsx-vector-6.h file is included into the processor specific test files
vsx-vector-6.p7.c, vsx-vector-6.p8.c, and vsx-vector-6.p9.c.  The .h file
contains a large number of vsx vector builtin tests.  The processor
specific files contain the number of instructions that the tests are
expected to generate for that processor.  The tests are compile only.

This patch reworks the tests into a series of files for related tests.
The new tests consist of a runnable test to verify the builtin argument
types and the functional correctness of each builtin.  There is also a
compile only test that verifies the builtins generate the expected number
of instructions for the various builtin tests.

gcc/testsuite/
* gcc.target/powerpc/vsx-vector-6-func-1op.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-1op-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-1op-compile.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2lop.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2lop-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2lop-compile.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2op.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2op-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-2op-compile.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-3op.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-3op-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-3op-compile.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-cmp-all.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-cmp-all-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-cmp-all-compile.c: New test
file.
* gcc.target/powerpc/vsx-vector-6-func-cmp.h: New test file.
* gcc.target/powerpc/vsx-vector-6-func-cmp-run.c: New test file.
* gcc.target/powerpc/vsx-vector-6-func-cmp-compile.c: New test file.
* gcc.target/powerpc/vsx-vector-6.h: Remove test file.
* gcc.target/powerpc/vsx-vector-6.p7.c: Remove test file.
* gcc.target/powerpc/vsx-vector-6.p8.c: Remove test file.
* gcc.target/powerpc/vsx-vector-6.p9.c: Remove test file.
---
 .../powerpc/vsx-vector-6-func-1op-compile.c   |  22 ++
 .../powerpc/vsx-vector-6-func-1op-run.c   |  98 
 .../powerpc/vsx-vector-6-func-1op.h   |  43 
 .../powerpc/vsx-vector-6-func-2lop-compile.c  |  14 ++
 .../powerpc/vsx-vector-6-func-2lop-run.c  | 177 ++
 .../powerpc/vsx-vector-6-func-2lop.h  |  47 
 .../powerpc/vsx-vector-6-func-2op-compile.c   |  21 ++
 .../powerpc/vsx-vector-6-func-2op-run.c   |  96 
 .../powerpc/vsx-vector-6-func-2op.h   |  42 
 .../powerpc/vsx-vector-6-func-3op-compile.c   |  17 ++
 .../powerpc/vsx-vector-6-func-3op-run.c   | 229 ++
 .../powerpc/vsx-vector-6-func-3op.h   |  73 ++
 .../vsx-vector-6-func-cmp-all-compile.c   |  17 ++
 .../powerpc/vsx-vector-6-func-cmp-all-run.c   | 147 +++
 .../powerpc/vsx-vector-6-func-cmp-all.h   |  76 ++
 .../powerpc/vsx-vector-6-func-cmp-compile.c   |  16 ++
 .../powerpc/vsx-vector-6-func-cmp-run.c   |  92 +++
 .../powerpc/vsx-vector-6-func-cmp.h   |  40 +++
 .../gcc.target/powerpc/vsx-vector-6.h | 154 
 .../gcc.target/powerpc/vsx-vector-6.p7.c  |  43 
 .../gcc.target/powerpc/vsx-vector-6.p8.c  |  43 
 .../gcc.target/powerpc/vsx-vector-6.p9.c  |  42 
 22 files changed, 1267 insertions(+), 282 deletions(-)
 create mode 100644

Re: [PATCH ver 3] rs6000: Update the vsx-vector-6.* tests.

2023-07-06 Thread Carl Love via Gcc-patches

Kewen:

On Tue, 2023-07-04 at 10:49 +0800, Kewen.Lin wrote:
> 



> > 
> > The tests are broken up into a seriers of files for related
> > tests.  The
> 
> s/seriers/series/

Fixed

> 
> > new tests are runnable tests to verify the builtin argument types
> > and the
> > functional correctness of each test rather then verifying the type
> > and
> > number of instructions generated.
> > 
> > gcc/testsuite/
> > * gcc.target/powerpc/vsx-vector-6-1op.c: New test file.
> > * gcc.target/powerpc/vsx-vector-6-2lop.c: New test file.
> > * gcc.target/powerpc/vsx-vector-6-2op.c: New test file.
> > * gcc.target/powerpc/vsx-vector-6-3op.c: New test file.
> > * gcc.target/powerpc/vsx-vector-6-cmp-all.c: New test file.
> > * gcc.target/powerpc/vsx-vector-6-cmp.c: New test file.
> 
> Missing "func-" in the names ...

Fixed.

> 
> > * gcc.target/powerpc/vsx-vector-6.h: Remove test file.
> > * gcc.target/powerpc/vsx-vector-6-p7.h: Remove test file.
> > * gcc.target/powerpc/vsx-vector-6-p8.h: Remove test file.
> > * gcc.target/powerpc/vsx-vector-6-p9.h: Remove test file.
> 
> should be vsx-vector-6-p{7,8,9}.c, "git gcc-verify" should catch
> these.

Fixed, ran git gcc-verify which found a couple more little file name
typos.
> 
> > ---
> >  .../powerpc/vsx-vector-6-func-1op.c   | 141 ++
> >  .../powerpc/vsx-vector-6-func-2lop.c  | 217
> > +++
> >  .../powerpc/vsx-vector-6-func-2op.c   | 133 +
> >  .../powerpc/vsx-vector-6-func-3op.c   | 257
> > ++
> >  .../powerpc/vsx-vector-6-func-cmp-all.c   | 211 ++
> >  .../powerpc/vsx-vector-6-func-cmp.c   | 121 +
> >  .../gcc.target/powerpc/vsx-vector-6.h | 154 ---
> >  .../gcc.target/powerpc/vsx-vector-6.p7.c  |  43 ---
> >  .../gcc.target/powerpc/vsx-vector-6.p8.c  |  43 ---
> >  .../gcc.target/powerpc/vsx-vector-6.p9.c  |  42 ---
> >  10 files changed, 1080 insertions(+), 282 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-
> > func-1op.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-
> > func-2lop.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-
> > func-2op.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-
> > func-3op.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-
> > func-cmp-all.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-
> > func-cmp.c
> >  delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.h
> >  delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-
> > 6.p7.c
> >  delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-
> > 6.p8.c
> >  delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-
> > 6.p9.c
> > 
> > diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-
> > 1op.c b/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-1op.c
> > new file mode 100644
> > index 000..52c7ae3e983
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-1op.c
> > @@ -0,0 +1,141 @@
> > +/* { dg-do run { target lp64 } } */
> > +/* { dg-skip-if "" { powerpc*-*-darwin* } } */
> > +/* { dg-options "-O2 -save-temps" } */
> 
> I just noticed that we missed an effective target check here to
> ensure the
> support of those bifs during the test run, and since it's a runnable
> test
> case, also need to ensure the generated hw insn supported, it's
> "vsx_hw"
> like:
> 
> /* { dg-require-effective-target vsx_hw } */
> 
> And adding "-mvsx" to the dg-options.

Add the effective-target and -mvsx to all of the tests.

> 
> This is also applied for the other test cases.
> 
> But as the discussion on xxlor and the different effective target
> requirements
> on compilation part and run part, I think we can separate each of
> these cases into
> two files, one for compilation and the other for run, for example,
> for this
> case, update FLOAT_TEST by adding one more global variable like
> 
> #define FLOAT_TEST(NAME)
>   vector float f_##NAME##_result; \
>   void ... \
>   f_##NAME##_result = vec_##NAME(f_src);\
>   }
>   // moving the checking code to its main.
> 
> move #include , FLOAT_TEST(NAME), DOUBLE_TEST(NAME)
> defines
> and their uses into vsx-vector-6-func-1op.h.
> 
> 
> **For compilation file vsx-vector-6-func-1op.c**:
> 
> Include this header file into vsx-vector-6-func-1op.c, which has the
> 
> /* { dg-do compile { target lp64 } } */
> /* { dg-require-effective-target powerpc_vsx_ok } */
> /* { dg-options "-O2 -mvsx" } */
> 
> #include "vsx-vector-6-func-1op.h"
> 
> Then put the expected insn check here, like 
> 
> /* { dg-final { scan-assembler-times {\mxvabssp\M} 1 } } */
> ...
> 
> By organizing it like this, these scan-assembler-times would only
> focus on what
> are generated for bifs (excluding possible noises from main function
> for running).
> 
> 
> **For runnable file

Re: [PATCH] rs6000: Update the vsx-vector-6.* tests.

2023-07-03 Thread Carl Love via Gcc-patches

Kewen:

On Fri, 2023-06-30 at 15:20 -0700, Carl Love wrote:
> Segher never liked the above way of looking at the assembly.  He
> prefers:
>   gcc -S -g -mcpu=power8 -o vsx-vector-6-func-2lop.s vsx-vector-6-
> func-
> 2lop.c
> 
>   grep xxlor vsx-vector-6-func-2lop.s | wc
>  34  68 516
> 
> So, again, I get the same count of 34 on both makalu and genoa.  But
> again, that doesn't agree with what make script/scan-assembler thinks
> the counts should be.
> 
> When I looked at the vsx-vector-6-func-2lop.s I see on BE:
> 
>  
> lxvd2x 0,10,9
> xxlor 0,12,0
> xxlnor 0,0,0
>  ...
> 
> I was guessing that it was adjusting the data layout from the load. 
> But looking again more carefully versus LE:
> 
> 
> lxvd2x 0,31,9 
>xxpermdi 0,0,0,2 
>xxlor 0,12,0  
>xxlnor 0,0,0  
>xxpermdi 0,0,0,2 
> 
> 
> the xxpermdi is probably what is really doing the data layout change.
> 
> So, we have the issue that looking at the assembly gives different
> instruction counts then what 
> 
>dg-final { scan-assembler-times {\mxxlor\M} }
> 
> comes up with???  Now I am really confused.  I don't know how the
> scan-
> assembler-times works but I will go see if I can find it and see if I
> can figure out what the issue is.  I would expect that the scan-
> assembler is working off the --save-temp files, which get deleted as
> part of the run.  I would guess that scan-assembler does a grep to
> find
> the instructions and then maybe uses wc to count them??? I will go
> see
> if I can figure out how scan-assembler-times works.

OK, I figured out why I was getting 34 xxlor instructions instead of
the 22 that the scan-assembler-times was getting.  The difference was
when I compiled the program I forgot to use -O2.  So with -O2 I get the
same number of xxlor instructins as scan-assembler-instructions.  I get
34 if I do not specify optimization.

So, I think the scan-assembler-times are all correct.

As Peter says, counting xxlor is a bit problematic in general.  We
could just drop counting xxlor or have the LE/BE count qualifier for
the instructions.  Your call.

 Carl

[PATCH ver 2] rs6000, __builtin_set_fpscr_rn add retrun value

2023-06-30 Thread Carl Love via Gcc-patches



GCC maintainers:

Ver 2,  Went back thru the requirements and emails.  Not sure where I
came up with the requirement for an overloaded version with double
argument.  Removed the overloaded version with the double argument. 
Added the macro to announce if the __builtin_set_fpscr_rn returns a
void or a double with the FPSCR bits.  Updated the documentation file. 
Retested on Power 8 BE/LE, Power 9 BE/LE, Power 10 LE.  Redid the test
file.  Per request, the original test file functionality was not
changed.  Just changed the name from test_fpscr_rn_builtin.c to 
test_fpscr_rn_builtin_1.c.  Put new tests for the return values into a
new test file, test_fpscr_rn_builtin_2.c.

The GLibC team requested a builtin to replace the mffscrn and
mffscrniinline asm instructions in the GLibC code.  Previously there
was discussion on adding builtins for the mffscrn instructions.

https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620261.html

In the end, it was felt that it would be to extend the existing
__builtin_set_fpscr_rn builtin to return a double instead of a void
type.  The desire is that we could have the functionality of the
mffscrn and mffscrni instructions on older ISAs.  The two instructions
were initially added in ISA 3.0.  The __builtin_set_fpscr_rn has the
needed functionality to set the RN field using the mffscrn and mffscrni
instructions if ISA 3.0 is supported or fall back to using logical
instructions to mask and set the bits for earlier ISAs.  The
instructions return the current value of the FPSCR fields DRN, VE, OE,
UE, ZE, XE, NI, RN bit positions then update the RN bit positions with
the new RN value provided.

The current __builtin_set_fpscr_rn builtin has a return type of void. 
So, changing the return type to double and returning the  FPSCR fields
DRN, VE, OE, UE, ZE, XE, NI, RN bit positions would then give the
functionally equivalent of the mffscrn and mffscrni instructions.  Any
current uses of the builtin would just ignore the return value yet any
new uses could use the return value.  So the requirement is for the
change to the __builtin_set_fpscr_rn builtin to be backwardly
compatible and work for all ISAs.

The following patch changes the return type of the
 __builtin_set_fpscr_rn builtin from void to double.  The return value
is the current value of the various FPSCR fields DRN, VE, OE, UE, ZE,
XE, NI, RN bit positions when the builtin is called.  The builtin then
updated the RN field with the new value provided as an argument to the
builtin.  The patch adds new testcases to test_fpscr_rn_builtin.c to
check that the builtin returns the current value of the FPSCR fields
and then updates the RN field.

The GLibC team has reviewed the patch to make sure it met their needs
as a drop in replacement for the inline asm mffscr and mffscrni
statements in the GLibC code.  T

The patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power 10
LE.

Please let me know if the patch is acceptable for mainline.  Thanks.

   Carl 


--
rs6000, __builtin_set_fpscr_rn add retrun value

Change the return value from void to double.  The return value consists of
the FPSCR fields DRN, VE, OE, UE, ZE, XE, NI, RN bit positions.  Add an
overloaded version which accepts a double argument.

The test powerpc/test_fpscr_rn_builtin.c is updated to add tests for the
double reterun value and the new double argument.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_set_fpscr_rn): Update
builtin definition return type.
* config/rs6000-c.cc(rs6000_target_modify_macros): Add check, define
__SET_FPSCR_RN_RETURNS_FPSCR__ macro.
* config/rs6000/rs6000.md ((rs6000_get_fpscr_fields): New
define_expand.
(rs6000_update_fpscr_rn_field): New define_expand.
(rs6000_set_fpscr_rn): Addedreturn argument.  Updated to use new
rs6000_get_fpscr_fields and rs6000_update_fpscr_rn_field define
 _expands.
* doc/extend.texi (__builtin_set_fpscr_rn): Update description for
the return value and new double argument.  Add descripton for
__SET_FPSCR_RN_RETURNS_FPSCR__ macro.

gcc/testsuite/ChangeLog:
gcc.target/powerpc/test_fpscr_rn_builtin.c: Renamed to
test_fpscr_rn_builtin_1.c.  Added comment.
gcc.target/powerpc/test_fpscr_rn_builtin_2.c: New test for the
return value of __builtin_set_fpscr_rn builtin.
---
 gcc/config/rs6000/rs6000-builtins.def |   2 +-
 gcc/config/rs6000/rs6000-c.cc |   4 +
 gcc/config/rs6000/rs6000.md   |  87 +++---
 gcc/doc/extend.texi   |  26 ++-
 ...rn_builtin.c => test_fpscr_rn_builtin_1.c} |   6 +
 .../powerpc/test_fpscr_rn_builtin_2.c | 153 ++
 6 files changed, 246 insertions(+), 32 deletions(-)
 rename gcc/testsuite/gcc.target/powerpc/{test_fpscr_rn_builtin.c => 
test_fpscr_rn_builtin_1.c} (92%)
 create mode 100644

Re: [PATCH] rs6000: Update the vsx-vector-6.* tests.

2023-06-30 Thread Carl Love via Gcc-patches

Kewen:

On Fri, 2023-06-30 at 15:20 -0700, Carl Love wrote:
> So, went to look at the assembly to verify my comment on the
> difference
> being related to the loads. I decided to actually count the
> instructions just to verify the number in the assembly files. 
> Before,
> I just looked at the assembly briefly but didn't dig in very deep.
> 
> If I compile the tests and dump the assembly with:
>   gcc -g -mcpu=power8 -o vsx-vector-6-func-2lop vsx-vector-6-func-
> 2lop.c
> 
>   objdump -S -d vsx-vector-6-func-2lop > vsx-vector-6-func-2lop.dump
>   
>   grep xxlor vsx-vector-6-func-2lop.dump | wc
>   4  28 192
> 
> So we see 4 xxlor instructions not 32 as expeced for BE or 22 as
> expected for LE as the test claims.  I get the same count of 4 on
> both
> makalu and on genoa. 

With a little help from Peter and Julian Wang.  Objdump decodes some of
the xxlor instructions as xxmr instsructions.  The xxmr is a new
mnemonic which will be out in the next ISA.  But objdump already
produces it.  So if you add the counts for grep xxlor and grep xxmr you
get a total of 34 which agress with the count of xxlor in the gcc -S
generated assembly.

  Carl

Re: [PATCH] rs6000: Update the vsx-vector-6.* tests.

2023-06-30 Thread Carl Love via Gcc-patches

Kewen:

On Fri, 2023-06-30 at 11:37 +0800, Kewen.Lin wrote:
> Hi Carl,
> 
> on 2023/6/30 05:36, Carl Love wrote:
> > Kewen:
> > 
> > On Wed, 2023-06-28 at 16:35 +0800, Kewen.Lin wrote:
> > > > Yea, I was going with a runnable test and didn't include the
> > > > instruction counts.  Added back in.  Rather then doing by
> > > > processor
> > > > version (P8, P9, P10) I was able to do it by BE/LE.  The
> > > > instruction
> > > > counts were the same for LE accross processor versions but
> > > > there
> > > > are a
> > > > few instruction counts that vary with BE and LE.
> > > 
> > > But the original test case only checks for cpu-types (processor
> > > version)
> > > but not for endianness, it means for the bif usages, there should
> > > not
> > > be
> > > different for endianness.  Why does this changes with your new
> > > test
> > > case?
> > > Could you have a further look and make it consistent with some
> > > adjustment
> > > if possible?  As we know, checking insn counts sometimes are
> > > fragile,
> > > so
> > > I think we should try our best to make it as robust as possible
> > > in
> > > the
> > > first place.
> > > 
> > > Besides, the original case also have some differences between
> > > p7/p8
> > > and
> > > p9.
> > >   
> > 
> > There are differences on P8 LE versus BE.  I did a diff between the
> > P8
> > and P9 tests:
> > 
> >  diff vsx-vector-6.p8.c vsx-vector-6.p9.c
> > 3,4c3,4
> > < /* { dg-require-effective-target powerpc_p8vector_ok } */
> > < /* { dg-options "-O2 -mdejagnu-cpu=power8" } */
> > ---
> > > /* { dg-require-effective-target powerpc_p9vector_ok } */
> > > /* { dg-options "-O2 -mdejagnu-cpu=power9" } */
> > 12c12
> > < /* { dg-final { scan-assembler-times {\mvperm\M} 1 } } */
> > ---
> > > /* { dg-final { scan-assembler-times {\m(?:v|xx)permr?\M} 1 } }
> > > */
> > 23d22
> > < /* { dg-final { scan-assembler-times {\mxvmsub[am]dp\M} 1 } } */
> > 37c36
> > < /* { dg-final { scan-assembler-times {\mxvsubdp\M} 1 } } */
> > ---
> > > /* { dg-final { scan-assembler-times {\mxvmsub[am]dp\M} 1 } } */
> > 
> > So we can see the vperm, vpermr, xxpermr, xvmsubadp, xvmsubmdp,
> > xvsubdp, xvmsubadp, xvmsubmdp instruction count checks are
> > different
> > between the two architectures.  I then wrote a script to compile
> > the
> > CPU specific test on Power 8, Power 9 and Power 10 architectures
> > and
> > then grep for the above list of instructions.  If I run the scrip
> > on P8
> > BE  and LE I get> 
> > 
> > Power 8 BEPower 8 LE   Power 9 LE   Power 9
> > BEPower 10 LE*
> >(makalu-
> > lp1)(genoa) (marlin)  (nilram)   (ltcd97-lp3)
> > instruction   count countcount count   
> >  count
> > vperm  1  10 0 
> >0
> > vpermr 0  00 0 
> >0
> > xxpermr0  01 0 
> >1
> > xvmsubadp  1  01 1 
> >1
> > xvmsubmdp  0  10 0 
> >0
> > xvsubdp1  11 1 
> >1
> > 
> 
> Thanks for looking into this and making this statistics.
> 
> Is there a typo for column nilram?   Otherwise, the below insn check
> 
> /* { dg-final { scan-assembler-times {\m(?:v|xx)permr?\M} 1 } } */
> 
> would fail there.

Yes, there is a typo in the nilram column.  The test generates a vperm
instruction.

#if defined (__BIG_ENDIAN__) || defined (_ARCH_PWR9)
  dst[8].d = vec_perm (src0[8].d, src1[8].d, src2[8].uc);
 f74:   e9 3f 00 78 ld  r9,120(r31)
 f78:   39 29 07 00 addir9,r9,1792
 f7c:   f5 89 00 01 lxv vs12,0(r9)
 f80:   e9 3f 00 80 ld  r9,128(r31)
 f84:   39 29 07 00 addir9,r9,1792
 f88:   f4 09 00 01 lxv vs0,0(r9)
 f8c:   e9 3f 00 88 ld  r9,136(r31)
 f90:   39 29 07 00 addir9,r9,1792
 f94:   f4 09 00 89 lxv vs32,128(r9)
 f98:   e9 3f 00 70 ld  r9,112(r31)
 f9c:   39 29 07 00 addir9,r9,1792
 fa0:   f0 2c 64 91 xxmrvs33,vs12
 fa4:   f1 a0 04 91 xxmrvs45,vs0
 fa8:   10 01 68 2b vperm   v0,v1,v13,v0
 ...

> 

> > 
> > I had played with putting -Wno-inline on the command line but that
> > didn't seem to make any difference.  However, you suggestion of
> > __attribute__ ((noipa)) does prevent the inlining and we don't get
> > the
> > second copy of the instructions showing up. The inlining eliminated
> > the
> > LE/BE differences for xvmaxsp, xvminsp and xvmaxdp.
> 
> -Winline is a option for warning: "Warn if a function that is
> declared
> as inline cannot be inlined.", I think what you wanted is -fno-
> inline,
> and it's good to know noipa helps here.

Yea, my bad.  Didn't read the manual very carefully.  
> 
> > The

[PATCH ver 3] rs6000: Update the vsx-vector-6.* tests.

2023-06-29 Thread Carl Love via Gcc-patches

GCC maintainers:

Ver 3.  Added __attribute__ ((noipa)) to the test files.  Changed some
of the scan-assembler-times checks to cover multiple similar
instructions.  Change the function check macro to a macro to generate a
function to do the test and check the results.  Retested on the various
processor types and BE/LE versions.

Ver 2.  Switched to using code macros to generate the call to the
builtin and test the results.  Added in instruction counts for the key
instruction for the builtin.  Moved the tests into an additional
function call to ensure the compile doesn't replace the builtin call
code with the statically computed results.  The compiler was doing this
for a few of the simpler tests.  

The following patch takes the tests in vsx-vector-6-p7.h,  vsx-vector-
6-p8.h, vsx-vector-6-p9.h and reorganizes them into a series of smaller
test files by functionality rather than processor version.

Tested the patch on Power 8 LE/BE, Power 9 LE/BE and Power 10 LE with
no regresions.

Please let me know if this patch is acceptable for mainline.  Thanks.

   Carl


-
rs6000: Update the vsx-vector-6.* tests.

The vsx-vector-6.h file is included into the processor specific test files
vsx-vector-6.p7.c, vsx-vector-6.p8.c, and vsx-vector-6.p9.c.  The .h file
contains a large number of vsx vector builtin tests.  The processor
specific files contain the number of instructions that the tests are
expected to generate for that processor.  The tests are compile only.

The tests are broken up into a seriers of files for related tests.  The
new tests are runnable tests to verify the builtin argument types and the
functional correctness of each test rather then verifying the type and
number of instructions generated.

gcc/testsuite/
* gcc.target/powerpc/vsx-vector-6-1op.c: New test file.
* gcc.target/powerpc/vsx-vector-6-2lop.c: New test file.
* gcc.target/powerpc/vsx-vector-6-2op.c: New test file.
* gcc.target/powerpc/vsx-vector-6-3op.c: New test file.
* gcc.target/powerpc/vsx-vector-6-cmp-all.c: New test file.
* gcc.target/powerpc/vsx-vector-6-cmp.c: New test file.
* gcc.target/powerpc/vsx-vector-6.h: Remove test file.
* gcc.target/powerpc/vsx-vector-6-p7.h: Remove test file.
* gcc.target/powerpc/vsx-vector-6-p8.h: Remove test file.
* gcc.target/powerpc/vsx-vector-6-p9.h: Remove test file.
---
 .../powerpc/vsx-vector-6-func-1op.c   | 141 ++
 .../powerpc/vsx-vector-6-func-2lop.c  | 217 +++
 .../powerpc/vsx-vector-6-func-2op.c   | 133 +
 .../powerpc/vsx-vector-6-func-3op.c   | 257 ++
 .../powerpc/vsx-vector-6-func-cmp-all.c   | 211 ++
 .../powerpc/vsx-vector-6-func-cmp.c   | 121 +
 .../gcc.target/powerpc/vsx-vector-6.h | 154 ---
 .../gcc.target/powerpc/vsx-vector-6.p7.c  |  43 ---
 .../gcc.target/powerpc/vsx-vector-6.p8.c  |  43 ---
 .../gcc.target/powerpc/vsx-vector-6.p9.c  |  42 ---
 10 files changed, 1080 insertions(+), 282 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-1op.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-2lop.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-2op.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-3op.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-cmp-all.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-cmp.c
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.h
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.p7.c
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.p8.c
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.p9.c

diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-1op.c 
b/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-1op.c
new file mode 100644
index 000..52c7ae3e983
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-1op.c
@@ -0,0 +1,141 @@
+/* { dg-do run { target lp64 } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } } */
+/* { dg-options "-O2 -save-temps" } */
+
+/* Functional test of the one operand vector builtins.  */
+
+#include 
+#include 
+#include 
+
+#define DEBUG 0
+
+void abort (void);
+
+/* Macro to check the results for the various floating point argument tests.
+ */
+#define FLOAT_TEST(NAME)  \
+  void __attribute__ ((noipa))\
+  float_##NAME (vector float f_src, vector float f_##NAME##_expected) \
+  {  \
+vector float f_result = vec_##NAME(f_src);   \
+  \
+if

Re: [PATCH] rs6000: Update the vsx-vector-6.* tests.

2023-06-29 Thread Carl Love via Gcc-patches

Kewen:

On Wed, 2023-06-28 at 16:35 +0800, Kewen.Lin wrote:
> > Yea, I was going with a runnable test and didn't include the
> > instruction counts.  Added back in.  Rather then doing by processor
> > version (P8, P9, P10) I was able to do it by BE/LE.  The
> > instruction
> > counts were the same for LE accross processor versions but there
> > are a
> > few instruction counts that vary with BE and LE.
> 
> But the original test case only checks for cpu-types (processor
> version)
> but not for endianness, it means for the bif usages, there should not
> be
> different for endianness.  Why does this changes with your new test
> case?
> Could you have a further look and make it consistent with some
> adjustment
> if possible?  As we know, checking insn counts sometimes are fragile,
> so
> I think we should try our best to make it as robust as possible in
> the
> first place.
> 
> Besides, the original case also have some differences between p7/p8
> and
> p9.
>   

There are differences on P8 LE versus BE.  I did a diff between the P8
and P9 tests:

 diff vsx-vector-6.p8.c vsx-vector-6.p9.c
3,4c3,4
< /* { dg-require-effective-target powerpc_p8vector_ok } */
< /* { dg-options "-O2 -mdejagnu-cpu=power8" } */
---
> /* { dg-require-effective-target powerpc_p9vector_ok } */
> /* { dg-options "-O2 -mdejagnu-cpu=power9" } */
12c12
< /* { dg-final { scan-assembler-times {\mvperm\M} 1 } } */
---
> /* { dg-final { scan-assembler-times {\m(?:v|xx)permr?\M} 1 } } */
23d22
< /* { dg-final { scan-assembler-times {\mxvmsub[am]dp\M} 1 } } */
37c36
< /* { dg-final { scan-assembler-times {\mxvsubdp\M} 1 } } */
---
> /* { dg-final { scan-assembler-times {\mxvmsub[am]dp\M} 1 } } */

So we can see the vperm, vpermr, xxpermr, xvmsubadp, xvmsubmdp,
xvsubdp, xvmsubadp, xvmsubmdp instruction count checks are different
between the two architectures.  I then wrote a script to compile the
CPU specific test on Power 8, Power 9 and Power 10 architectures and
then grep for the above list of instructions.  If I run the scrip on P8
BE  and LE I get

Power 8 BEPower 8 LE   Power 9 LE   Power 9 BEPower 10 LE*
   (makalu-lp1)(genoa) (marlin)  (nilram)   (ltcd97-lp3)
instruction   count countcount countcount
vperm  1  10 00
vpermr 0  00 00
xxpermr0  01 01
xvmsubadp  1  01 11
xvmsubmdp  0  10 00
xvsubdp1  11 11

>From the diff we see 

  { dg-final {scan-assembler-times {\mxvmsub[am]dp\M} 1 } }

This test picks up the correct subtraction instruction for LE versus BE
so this "masks" the LE/BE difference.  I changed the check in vsx-
vector-6-func-3op.c to match.  This eliminates the LE and BE checks and
reduces the number of specific checks.

In vsx-vector-6-func-3op.c  The new test checks the counts for
xxpermdi, which the original test does not check.  The check for
xxpermdi are not needed.  They are not directly related to the builtin
tests.  I removed them.

Looking at the LE/BE checks in the other test file vsx-vector-6-func-
2op.c, instructions xvmaxsp, xvminsp and xvmaxdp were not checked in
the original test.  The functions where these instructions are used get
inlined.  On LE, the binary instructions show up in the inlined code as
well as what appears to be the binary for the original, non-inlined
function.  Best I can see, the binary for the original function is dead
code.  I don't see any calls to it.  Seems like it shouldn't be there
as it would make the binary smaller. On BE, I don't see the binary for
the original non-inlined function.  

I had played with putting -Wno-inline on the command line but that
didn't seem to make any difference.  However, you suggestion of
__attribute__ ((noipa)) does prevent the inlining and we don't get the
second copy of the instructions showing up. The inlining eliminated the
LE/BE differences for xvmaxsp, xvminsp and xvmaxdp.

The instruction count test for xxlor in vsx-vector-6-func-2lop.c
differs on LE and BE vsx-vector-6-func-2op.c.  I believe the
instruction is used with loads to reorder the data.  I don't see anyway
to get around the extra xxlor instructions and verify the vec_or
builtin test generates the instruction.  

I was able to eliminate all of the LE/BE qualifiers in the instruction
counts with the exception of xxlor.  By using the same checks that look
for multiple versions of xvmsumb*, as was done in the original test, we
can also eliminate LE/BE specific tests and account for different
instructions across CPU versions.  We could go back to checking for
specific instructions being generated on Power 8, Power 9, Power 10 if
you prefer not using checks that cover multiple flavors of a given

[PATCH ver 2] rs6000: Update the vsx-vector-6.* tests.

2023-06-21 Thread Carl Love via Gcc-patches



GCC maintainers:

Ver 2.  Switched to using code macros to generate the call to the
builtin and test the results.  Added in instruction counts for the key
instruction for the builtin.  Moved the tests into an additional
function call to ensure the compile doesn't replace the builtin call
code with the statically computed results.  The compiler was doing this
for a few of the simpler tests.  

The following patch takes the tests in vsx-vector-6-p7.h,  vsx-vector-
6-p8.h, vsx-vector-6-p9.h and reorganizes them into a series of smaller
test files by functionality rather than processor version.

Tested the patch on Power 8 LE/BE, Power 9 LE/BE and Power 10 LE with
no regresions.

Please let me know if this patch is acceptable for mainline.  Thanks.

   Carl

--
rs6000: Update the vsx-vector-6.* tests.

The vsx-vector-6.h file is included into the processor specific test files
vsx-vector-6.p7.c, vsx-vector-6.p8.c, and vsx-vector-6.p9.c.  The .h file
contains a large number of vsx vector builtin tests.  The processor
specific files contain the number of instructions that the tests are
expected to generate for that processor.  The tests are compile only.

The tests are broken up into a seriers of files for related tests.  The
new tests are runnable tests to verify the builtin argument types and the
functional correctness of each test rather then verifying the type and
number of instructions generated.

gcc/testsuite/
* gcc.target/powerpc/vsx-vector-6-1op.c: New test file.
* gcc.target/powerpc/vsx-vector-6-2lop.c: New test file.
* gcc.target/powerpc/vsx-vector-6-2op.c: New test file.
* gcc.target/powerpc/vsx-vector-6-3op.c: New test file.
* gcc.target/powerpc/vsx-vector-6-cmp-all.c: New test file.
* gcc.target/powerpc/vsx-vector-6-cmp.c: New test file.
* gcc.target/powerpc/vsx-vector-6.h: Remove test file.
* gcc.target/powerpc/vsx-vector-6-p7.h: Remove test file.
* gcc.target/powerpc/vsx-vector-6-p8.h: Remove test file.
* gcc.target/powerpc/vsx-vector-6-p9.h: Remove test file.
---
 .../powerpc/vsx-vector-6-func-1op.c   | 156 ++
 .../powerpc/vsx-vector-6-func-2lop.c  | 223 ++
 .../powerpc/vsx-vector-6-func-2op.c   | 142 +
 .../powerpc/vsx-vector-6-func-3op.c   | 273 ++
 .../powerpc/vsx-vector-6-func-cmp-all.c   | 205 +
 .../powerpc/vsx-vector-6-func-cmp.c   | 130 +
 .../gcc.target/powerpc/vsx-vector-6.h | 154 --
 .../gcc.target/powerpc/vsx-vector-6.p7.c  |  43 ---
 .../gcc.target/powerpc/vsx-vector-6.p8.c  |  43 ---
 .../gcc.target/powerpc/vsx-vector-6.p9.c  |  42 ---
 10 files changed, 1129 insertions(+), 282 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-1op.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-2lop.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-2op.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-3op.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-cmp-all.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-cmp.c
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.h
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.p7.c
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.p8.c
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.p9.c

diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-1op.c 
b/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-1op.c
new file mode 100644
index 000..0d4e237673b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-1op.c
@@ -0,0 +1,156 @@
+/* { dg-do run { target lp64 } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } } */
+/* { dg-options "-O2 -save-temps" } */
+
+/* Functional test of the one operand vector builtins.  */
+
+#include 
+#include 
+#include 
+
+#define DEBUG 0
+
+void abort (void);
+
+  /* Macro to check the results for the various floating point argument tests.
+   */
+#define FLOAT_CHECK(NAME)  \
+  f_result = vec_##NAME(f_src);\
+   \
+  if ((f_result[0] != f_##NAME##_expected[0]) ||   \
+  (f_result[1] != f_##NAME##_expected[1]) ||   \
+  (f_result[2] != f_##NAME##_expected[2]) ||   \
+  (f_result[3] != f_##NAME##_expected[3])) \
+{  \
+  if (DEBUG) { \
+printf("ERROR: vec_%s (float) expected value does not match\n",\
+

Re: [PATCH] rs6000: Update the vsx-vector-6.* tests.

2023-06-21 Thread Carl Love via Gcc-patches

On Mon, 2023-06-19 at 15:17 +0800, Kewen.Lin wrote:
> Hi Carl,
> 
> on 2023/5/31 04:46, Carl Love wrote:
> > GCC maintainers:
> > 
> > The following patch takes the tests in vsx-vector-6-p7.h,  vsx-
> > vector-
> > 6-p8.h, vsx-vector-6-p9.h and reorganizes them into a series of
> > smaller
> > test files by functionality rather than processor version.
> > 
> > The patch has been tested on Power 10 with no regressions.
> > 
> > Please let me know if this patch is acceptable for
> > mainline.  Thanks.
> > 
> >Carl
> > 
> > --
> > rs6000: Update the vsx-vector-6.* tests.
> > 
> > The vsx-vector-6.h file is included into the processor specific
> > test files
> > vsx-vector-6.p7.c, vsx-vector-6.p8.c, and vsx-vector-6.p9.c.  The
> > .h file
> > contains a large number of vsx vector builtin tests.  The processor
> > specific files contain the number of instructions that the tests
> > are
> > expected to generate for that processor.  The tests are compile
> > only.
> > 
> > The tests are broken up into a seriers of files for related
> > tests.  The
> > new tests are runnable tests to verify the builtin argument types
> > and the
> 
> But the newly added test cases are all with "dg-do compile", it
> doesn't
> match what you said here.

Ah, yea, that is wrong.  Fixed.

> 
> > functional correctness of each test rather then verifying the type
> > and
> > number of instructions generated.
> 
> It's good to have more coverage with runnable case, but we miss some
> test
> coverages on the expected insn counts which cases p{7,8,9}.c can
> provide
> originally.  Unless we can ensure it's already tested somewhere else
> (do
> we? it wasn't stated in this patch), I think we still need those
> checks.

Yea, I was going with a runnable test and didn't include the
instruction counts.  Added back in.  Rather then doing by processor
version (P8, P9, P10) I was able to do it by BE/LE.  The instruction
counts were the same for LE accross processor versions but there are a
few instruction counts that vary with BE and LE.  

I did noticed in one of the tests that the compiler computed the
answers at compile time and thus didn't actually generate the builtin
code.  After digging a little more I found a few more tests where the
compiler was doing the calculations and just inserting the answers.

So, I moved all of the tests to functions so the compiler would
actually generate the desired builtin code.  

> 
> > gcc/testsuite/
> > * gcc.target/powerpc/vsx-vector-6-1op.c: New test file.
> > * gcc.target/powerpc/vsx-vector-6-2lop.c: New test file.
> > * gcc.target/powerpc/vsx-vector-6-2op.c: New test file.
> > * gcc.target/powerpc/vsx-vector-6-3op.c: New test file.
> > * gcc.target/powerpc/vsx-vector-6-cmp-all.c: New test file.
> > * gcc.target/powerpc/vsx-vector-6-cmp.c: New test file.
> > * gcc.target/powerpc/vsx-vector-6.h: Remove test file.
> > * gcc.target/powerpc/vsx-vector-6-p7.h: Remove test file.
> > * gcc.target/powerpc/vsx-vector-6-p8.h: Remove test file.
> > * gcc.target/powerpc/vsx-vector-6-p9.h: Remove test file.
> > ---
> >  .../powerpc/vsx-vector-6-func-1op.c   | 319 +
> >  .../powerpc/vsx-vector-6-func-2lop.c  | 305 +
> >  .../powerpc/vsx-vector-6-func-2op.c   | 278 
> >  .../powerpc/vsx-vector-6-func-3op.c   | 229 ++
> >  .../powerpc/vsx-vector-6-func-cmp-all.c   | 429
> > ++
> >  .../powerpc/vsx-vector-6-func-cmp.c   | 237 ++
> >  .../gcc.target/powerpc/vsx-vector-6.h | 154 ---
> >  .../gcc.target/powerpc/vsx-vector-6.p7.c  |  43 --
> >  .../gcc.target/powerpc/vsx-vector-6.p8.c  |  43 --
> >  .../gcc.target/powerpc/vsx-vector-6.p9.c  |  42 --
> >  10 files changed, 1797 insertions(+), 282 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-
> > func-1op.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-
> > func-2lop.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-
> > func-2op.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-
> > func-3op.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-
> > func-cmp-all.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-
> > func-cmp.c
> >  delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.h
> >  delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-
> > 6.p7.c
> >  delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-
> > 6.p8.c
> >  delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-
> > 6.p9.c
> > 
> > diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-
> > 1op.c b/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-1op.c
> > new file mode 100644
> > index 000..90a360ea158
> > --- /dev/null
> > +++

[PATCH ver 6] rs6000: Add builtins for IEEE 128-bit floating point values

2023-06-19 Thread Carl Love via Gcc-patches



Kewen, GCC maintainers:

Version 6, Fixed missing change log entry.  Changed builtin id names as
requested.  Missed making the change on the last version.  Fixed
comment in the three test cases.  Reran regression suite on Power 10,
no regressions.

Version 5, Tested the patch on P9 BE per request.  Fixed up test case
to get the correct expected values for BE and LE.  Fixed typos. 
Updated the doc/extend.texi to clarify the vector arguments.  Changed
test file names per request.  Moved builtin defs next to related
definitions.  Renamed new mode_attr. Removed new mode_iterator, used
existing iterator instead. Renamed mode_iterator VSEEQP_DI to V2DI_DI. 
Fixed up overloaded definitions per request.

Version 4, added missing cases for new xxexpqp, xsxexpdp and xsxsigqp
cases to rs6000_expand_builtin.  Merged the new define_insn definitions
with the existing definitions.  Renamed the builtins by removing the
__builtin_ prefix from the names.  Fixed the documentation for the
builtins.  Updated the test files to check the desired instructions
were generated.  Retested patch on Power 10 with no regressions.

Version 3, was able to get the overloaded version of scalar_insert_exp
to work and the change to xsxexpqp_f128_ define instruction to
work with the suggestions from Kewen.  

Version 2, I have addressed the various comments from Kewen.  I had
issues with adding an additional overloaded version of
scalar_insert_exp with vector arguments.  The overload infrastructure
didn't work with a mix of scalar and vector arguments.  I did rename
the __builtin_insertf128_exp to __builtin_vsx_scalar_insert_exp_qp make
it similar to the existing builtin.  I also wasn't able to get the
suggested merge of xsxexpqp_f128_ with xsxexpqp_ to work so
I left the two simpler definitiions.

The patch add three new builtins to extract the significand and
exponent of an IEEE float 128-bit value where the builtin argument is a
vector.  Additionally, a builtin to insert the exponent into an IEEE
float 128-bit vector argument is added.  These builtins were requested
since there is no clean and optimal way to transfer between a vector
and a scalar IEEE 128 bit value.

The patch has been tested on Power 9 BE and Power 10 LE with no
regressions.  Please let me know if the patch is acceptable or not. 
Thanks.

   Carl


rs6000: Add builtins for IEEE 128-bit floating point values

Add support for the following builtins:

 __vector unsigned long long int scalar_extract_exp_to_vec (__ieee128);
 __vector unsigned __int128 scalar_extract_sig_to_vec (__ieee128);
 __ieee128 scalar_insert_exp (__vector unsigned __int128,
  __vector unsigned long long);

The instructions used in the builtins operate on vector registers.  Thus
the result must be moved to a scalar type.  There is no clean, performant
way to do this.  The user code typically needs the result as a vector
anyway.

gcc/
* config/rs6000/rs6000-builtin.cc (rs6000_expand_builtin):
Rename CCDE_FOR_xsxexpqp_tf to CODE_FOR_xsxexpqp_tf_di.
Rename CODE_FOR_xsxexpqp_kf to CODE_FOR_xsxexpqp_kf_di.
(CODE_FOR_xsxexpqp_kf_v2di, CODE_FOR_xsxsigqp_kf_v1ti,
CODE_FOR_xsiexpqp_kf_v2di): Add case statements.
* config/rs6000/rs6000-buildin.def (__builtin_extractf128_exp,
 __builtin_extractf128_sig, __builtin_insertf128_exp): Add new
builtin definitions.
Rename xsxexpqp_kf, xsxsigqp_kf, xsiexpqp_kf to xsexpqp_kf_di,
xsxsigqp_kf_ti, xsiexpqp_kf_di respectively.
* config/rs6000/rs6000-c.cc (altivec_resolve_overloaded_builtin):
Update case RS6000_OVLD_VEC_VSIE to handle MODE_VECTOR_INT for new
overloaded instance. Update comments.
* config/rs6000/rs6000-overload.def
(__builtin_vec_scalar_insert_exp): Add new overload definition with
vector arguments.
(scalar_extract_exp_to_vec, scalar_extract_sig_to_vec): New
overloaded definitions.
* config/vsx.md (V2DI_DI): New mode iterator.
(DI_to_TI): New mode attribute.
Rename xsxexpqp_ to sxexpqp__.
Rename xsxsigqp_ to xsxsigqp__.
Rename xsiexpqp_ to xsiexpqp__.
* doc/extend.texi (__builtin_extractf128_exp,
__builtin_extractf128_sig): Add documentation for new builtins.
(scalar_insert_exp): Add new overloaded builtin definition.

gcc/testsuite/
* gcc.target/powerpc/bfp/extract-exp-8.c: New test case.
* gcc.target/powerpc/bfp/extract-sig-8.c: New test case.
* gcc.target/powerpc/bfp/insert-exp-16.c: New test case.
---
 gcc/config/rs6000/rs6000-builtin.cc   |  21 +++-
 gcc/config/rs6000/rs6000-builtins.def |  15 ++-
 gcc/config/rs6000/rs6000-c.cc |  10 +-
 gcc/config/rs6000/rs6000-overload.def |  12 ++
 gcc/config/rs6000/vsx.md  |  25 +++--
 gcc/doc/extend.texi   |  24 +++-

Re: [PATCH ver 5] rs6000: Add builtins for IEEE 128-bit floating point values

2023-06-19 Thread Carl Love via Gcc-patches

Kewen:

On Mon, 2023-06-19 at 14:08 +0800, Kewen.Lin wrote:
> > 



> Hi Carl,
> 
> on 2023/6/17 01:57, Carl Love wrote:
> > overloaded instance. Update comments.
> > * config/rs6000/rs6000-overload.def
> > (__builtin_vec_scalar_insert_exp): Add new overload definition
> > with
> > vector arguments.
> > (scalar_extract_exp_to_vec, scalar_extract_sig_to_vec): New
> > overloaded definitions.
> > * config/vsx.md (V2DI_DI): New mode iterator.
> 
> Missing an entry for DI_to_TI.

Opps, missed that.  Sorry, fixed.

> > 



> 
> >  
> >const signed long long __builtin_vsx_scalar_extract_expq
> > (_Float128);
> > -VSEEQP xsxexpqp_kf {}
> > +VSEEQP xsxexpqp_kf_di {}
> > +
> > +  vull __builtin_vsx_scalar_extract_exp_to_vec (_Float128);
> > +VSEEXPKF xsxexpqp_kf_v2di {}
> 
> As I pointed out previously, the related id is VSEEQP, since both of
> them

Oops, I guess I forgot to change that.  Sorry.

> have kf in their names, having KF in its id doesn't look good IMHO.
> How about VSEEQPV instead of VSEEXPKF?  It's also consistent with
> what
> we use for VSIEQP.

Yup, makes sense, changed to VSEEQPV.
> 
> >  
> >const signed __int128 __builtin_vsx_scalar_extract_sigq
> > (_Float128);
> > -VSESQP xsxsigqp_kf {}
> > +VSESQP xsxsigqp_kf_ti {}
> > +
> > +  vuq __builtin_vsx_scalar_extract_sig_to_vec (_Float128);
> > +VSESIGKF xsxsigqp_kf_v1ti {}
> 
> Similar to the above, s/VSESIGKF/VSESQPV/
 
Changed to VSESQPV.
> 
> >  
> >const _Float128 __builtin_vsx_scalar_insert_exp_q (unsigned
> > __int128, \
> >   unsigned long
> > long);
> > -VSIEQP xsiexpqp_kf {}
> > +VSIEQP xsiexpqp_kf_di {}
> >  
> >const _Float128 __builtin_vsx_scalar_insert_exp_qp (_Float128, \
> >unsigned
> > long long);
> >  VSIEQPF xsiexpqpf_kf {}
> >  
> > +  const _Float128 __builtin_vsx_scalar_insert_exp_vqp (vuq, vull);
> > +VSIEQPV xsiexpqp_kf_v2di {}
> > +
> >const signed int __builtin_vsx_scalar_test_data_class_qp
> > (_Float128, \
> >  const
> > int<7>);
> >  VSTDCQP xststdcqp_kf {}
> > diff --git a/gcc/config/rs6000/rs6000-c.cc
> > b/gcc/config/rs6000/rs6000-c.cc
> > index 8555174d36e..11060f697db 100644
> > --- a/gcc/config/rs6000/rs6000-c.cc
> > +++ b/gcc/config/rs6000/rs6000-c.cc
> > @@ -1929,11 +1929,15 @@ altivec_resolve_overloaded_builtin
> > (location_t loc, tree fndecl,
> >128-bit variant of built-in function.  */
> > if (GET_MODE_PRECISION (arg1_mode) > 64)
> >   {
> > -   /* If first argument is of float variety, choose variant
> > -  that expects __ieee128 argument.  Otherwise, expect
> > -  __int128 argument.  */
> > +   /* If first argument is of float variety, choose the
> > variant that
> > +  expects __ieee128 argument.  If the first argument is
> > vector
> > +  int, choose the variant that expects vector unsigned
> > +  __int128 argument.  Otherwise, expect scalar __int128
> > argument.
> > +   */
> > if (GET_MODE_CLASS (arg1_mode) == MODE_FLOAT)
> >   instance_code = RS6000_BIF_VSIEQPF;
> > +   else if (GET_MODE_CLASS (arg1_mode) == MODE_VECTOR_INT)
> > + instance_code = RS6000_BIF_VSIEQPV;
> > else
> >   instance_code = RS6000_BIF_VSIEQP;
> >   }
> > diff --git a/gcc/config/rs6000/rs6000-overload.def
> > b/gcc/config/rs6000/rs6000-overload.def
> > index c582490c084..05a5ca6a04d 100644
> > --- a/gcc/config/rs6000/rs6000-overload.def
> > +++ b/gcc/config/rs6000/rs6000-overload.def
> > @@ -4515,6 +4515,18 @@
> >  VSIEQP
> >_Float128 __builtin_vec_scalar_insert_exp (_Float128, unsigned
> > long long);
> >  VSIEQPF
> > +  _Float128 __builtin_vec_scalar_insert_exp (vuq, vull);
> > +VSIEQPV
> > +
> > +[VEC_VSEEV, scalar_extract_exp_to_vec, \
> > +__builtin_vec_scalar_extract_exp_to_vector]
> > +  vull __builtin_vec_scalar_extract_exp_to_vector (_Float128);
> > +VSEEXPKF
> > +
> 
> Need to update if the above changes.

changed 
> 
> > +[VEC_VSESV, scalar_extract_sig_to_vec, \
> > +__builtin_vec_scalar_extract_sig_to_vector]
> > +  vuq __builtin_vec_scalar_extract_sig_to_vector (_Float128);
> > +VSESIGKF
> >  
> 
> Ditto.

changed

> 



> > 
> > diff --git a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-
> > exp-8.c b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-exp-
> > 8.c
> > new file mode 100644
> > index 000..e24e09012d9
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-exp-8.c
> > @@ -0,0 +1,58 @@
> > +/* { dg-do run { target { powerpc*-*-* } } } */
> > +/* { dg-require-effective-target lp64 } */
> > +/* { dg-require-effective-target p9vector_hw } */
> > +/* { dg-options "-mdejagnu-cpu=power9 -save-temps" } */
> > +
> > +#include 
> > +#include 
> > +
> > +#if

[PATCH] rs6000, __builtin_set_fpscr_rn add retrun value

2023-06-19 Thread Carl Love via Gcc-patches

GCC maintainers:


The GLibC team requested a builtin to replace the mffscrn and mffscrniinline 
asm instructions in the GLibC code.  Previously there was discussion on adding 
builtins for the mffscrn instructions.

https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620261.html

In the end, it was felt that it would be to extend the existing
__builtin_set_fpscr_rn builtin to return a double instead of a void
type.  The desire is that we could have the functionality of the
mffscrn and mffscrni instructions on older ISAs.  The two instructions
were initially added in ISA 3.0.  The __builtin_set_fpscr_rn has the
needed functionality to set the RN field using the mffscrn and mffscrni
instructions if ISA 3.0 is supported or fall back to using logical
instructions to mask and set the bits for earlier ISAs.  The
instructions return the current value of the FPSCR fields DRN, VE, OE,
UE, ZE, XE, NI, RN bit positions then update the RN bit positions with
the new RN value provided.

The current __builtin_set_fpscr_rn builtin has a return type of void. 
So, changing the return type to double and returning the  FPSCR fields
DRN, VE, OE, UE, ZE, XE, NI, RN bit positions would then give the
functionally equivalent of the mffscrn and mffscrni instructions.  Any
current uses of the builtin would just ignore the return value yet any
new uses could use the return value.  So the requirement is for the
change to the __builtin_set_fpscr_rn builtin to be backwardly
compatible and work for all ISAs.

The following patch changes the return type of the
 __builtin_set_fpscr_rn builtin from void to double.  The return value
is the current value of the various FPSCR fields DRN, VE, OE, UE, ZE,
XE, NI, RN bit positions when the builtin is called.  The builtin then
updated the RN field with the new value provided as an argument to the
builtin.  The patch adds new testcases to test_fpscr_rn_builtin.c to
check that the builtin returns the current value of the FPSCR fields
and then updates the RN field.

The GLibC team has reviewed the patch to make sure it met their needs
as a drop in replacement for the inline asm mffscr and mffscrni
statements in the GLibC code.  T

The patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power 10
LE.

Please let me know if the patch is acceptable for mainline.  Thanks.

   Carl 


rs6000, __builtin_set_fpscr_rn add retrun value

Change the return value from void to double.  The return value consists of
the FPSCR fields DRN, VE, OE, UE, ZE, XE, NI, RN bit positions.  Add an
overloaded version which accepts a double argument.

The test powerpc/test_fpscr_rn_builtin.c is updated to add tests for the
double reterun value and the new double argument.

gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (__builtin_set_fpscr_rn): Delete.
(__builtin_set_fpscr_rn_i): New builtin definition.
(__builtin_set_fpscr_rn_d): New builtin definition.
* config/rs6000/rs6000-overload.def (__builtin_set_fpscr_rn): New
overloaded definition.
* config/rs6000/rs6000.md ((rs6000_get_fpscr_fields): New
define_expand.
(rs6000_update_fpscr_rn_field): New define_expand.
(rs6000_set_fpscr_rn_d): New define expand.
(rs6000_set_fpscr_rn_i): Renamed from rs6000_set_fpscr_rn, Added
return argument.  Updated to use new rs6000_get_fpscr_fields and
rs6000_update_fpscr_rn_field define _expands.
* doc/extend.texi (__builtin_set_fpscr_rn): Update description for
the return value and new double argument.

gcc/testsuite/ChangeLog:
gcc.target/powerpc/test_fpscr_rn_builtin.c: Add new tests th check
double return value.  Add tests for overloaded double argument.
re
---
 gcc/config/rs6000/rs6000-builtins.def |   7 +-
 gcc/config/rs6000/rs6000-overload.def |   6 +
 gcc/config/rs6000/rs6000.md   | 122 ---
 gcc/doc/extend.texi   |  25 ++-
 .../powerpc/test_fpscr_rn_builtin.c   | 143 +-
 5 files changed, 262 insertions(+), 41 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 289a37998b1..30e0b0bb06d 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -237,8 +237,11 @@
   const __ibm128 __builtin_pack_ibm128 (double, double);
 PACK_IF packif {ibm128}
 
-  void __builtin_set_fpscr_rn (const int[0,3]);
-SET_FPSCR_RN rs6000_set_fpscr_rn {nosoft}
+  double __builtin_set_fpscr_rn_i (const int[0,3]);
+SET_FPSCR_RN_I rs6000_set_fpscr_rn_i {nosoft}
+
+  double __builtin_set_fpscr_rn_d (double);
+SET_FPSCR_RN_D rs6000_set_fpscr_rn_d {nosoft}
 
   const double __builtin_unpack_ibm128 (__ibm128, const int<1>);
 UNPACK_IF unpackif {ibm128}
diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def
index

[PATCH ver 5] rs6000: Add builtins for IEEE 128-bit floating point values

2023-06-16 Thread Carl Love via Gcc-patches

Kewen, GCC maintainers:

Version 5, Tested the patch on P9 BE per request.  Fixed up test case
to get the correct expected values for BE and LE.  Fixed typos. 
Updated the doc/extend.texi to clarify the vector arguments.  Changed
test file names per request.  Moved builtin defs next to related
definitions.  Renamed new mode_attr. Removed new mode_iterator, used
existing iterator instead. Renamed mode_iterator VSEEQP_DI to V2DI_DI. 
Fixed up overloaded definitions per request.

Version 4, added missing cases for new xxexpqp, xsxexpdp and xsxsigqp
cases to rs6000_expand_builtin.  Merged the new define_insn definitions
with the existing definitions.  Renamed the builtins by removing the
__builtin_ prefix from the names.  Fixed the documentation for the
builtins.  Updated the test files to check the desired instructions
were generated.  Retested patch on Power 10 with no regressions.

Version 3, was able to get the overloaded version of scalar_insert_exp
to work and the change to xsxexpqp_f128_ define instruction to
work with the suggestions from Kewen.  

Version 2, I have addressed the various comments from Kewen.  I had
issues with adding an additional overloaded version of
scalar_insert_exp with vector arguments.  The overload infrastructure
didn't work with a mix of scalar and vector arguments.  I did rename
the __builtin_insertf128_exp to __builtin_vsx_scalar_insert_exp_qp make
it similar to the existing builtin.  I also wasn't able to get the
suggested merge of xsxexpqp_f128_ with xsxexpqp_ to work so
I left the two simpler definitiions.

The patch add three new builtins to extract the significand and
exponent of an IEEE float 128-bit value where the builtin argument is a
vector.  Additionally, a builtin to insert the exponent into an IEEE
float 128-bit vector argument is added.  These builtins were requested
since there is no clean and optimal way to transfer between a vector
and a scalar IEEE 128 bit value.

The patch has been tested on Power 9 BE and Power 10 LE with no
regressions.  Please let me know if the patch is acceptable or not. 
Thanks.

   Carl


rs6000: Add builtins for IEEE 128-bit floating point values

Add support for the following builtins:

 __vector unsigned long long int scalar_extract_exp_to_vec (__ieee128);
 __vector unsigned __int128 scalar_extract_sig_to_vec (__ieee128);
 __ieee128 scalar_insert_exp (__vector unsigned __int128,
  __vector unsigned long long);

The instructions used in the builtins operate on vector registers.  Thus
the result must be moved to a scalar type.  There is no clean, performant
way to do this.  The user code typically needs the result as a vector
anyway.

gcc/
* config/rs6000/rs6000-builtin.cc (rs6000_expand_builtin):
Rename CCDE_FOR_xsxexpqp_tf to CODE_FOR_xsxexpqp_tf_di.
Rename CODE_FOR_xsxexpqp_kf to CODE_FOR_xsxexpqp_kf_di.
(CODE_FOR_xsxexpqp_kf_v2di, CODE_FOR_xsxsigqp_kf_v1ti,
CODE_FOR_xsiexpqp_kf_v2di): Add case statements.
* config/rs6000/rs6000-buildin.def (__builtin_extractf128_exp,
 __builtin_extractf128_sig, __builtin_insertf128_exp): Add new
builtin definitions.
Rename xsxexpqp_kf, xsxsigqp_kf, xsiexpqp_kf to xsexpqp_kf_di,
xsxsigqp_kf_ti, xsiexpqp_kf_di respectively.
* config/rs6000/rs6000-c.cc (altivec_resolve_overloaded_builtin):
Update case RS6000_OVLD_VEC_VSIE to handle MODE_VECTOR_INT for new
overloaded instance. Update comments.
* config/rs6000/rs6000-overload.def
(__builtin_vec_scalar_insert_exp): Add new overload definition with
vector arguments.
(scalar_extract_exp_to_vec, scalar_extract_sig_to_vec): New
overloaded definitions.
* config/vsx.md (V2DI_DI): New mode iterator.
Rename xsxexpqp_ to sxexpqp__.
Rename xsxsigqp_ to xsxsigqp__.
Rename xsiexpqp_ to xsiexpqp__.
* doc/extend.texi (__builtin_extractf128_exp,
__builtin_extractf128_sig): Add documentation for new builtins.
(scalar_insert_exp): Add new overloaded builtin definition.

gcc/testsuite/
* gcc.target/powerpc/bfp/extract-exp-8.c: New test case.
* gcc.target/powerpc/bfp/extract-sig-8.c: New test case.
* gcc.target/powerpc/bfp/insert-exp-16.c: New test case.
---
 gcc/config/rs6000/rs6000-builtin.cc   |  21 +++-
 gcc/config/rs6000/rs6000-builtins.def |  15 ++-
 gcc/config/rs6000/rs6000-c.cc |  10 +-
 gcc/config/rs6000/rs6000-overload.def |  12 ++
 gcc/config/rs6000/vsx.md  |  25 +++--
 gcc/doc/extend.texi   |  24 +++-
 .../powerpc/bfp/scalar-extract-exp-8.c|  58 ++
 .../powerpc/bfp/scalar-extract-sig-8.c|  65 +++
 .../powerpc/bfp/scalar-insert-exp-16.c| 103 ++
 9 files changed, 307 insertions(+), 26 deletions(-)
 create mode 100644

Re: [PATCH ver 4] rs6000: Add builtins for IEEE 128-bit floating point values

2023-06-16 Thread Carl Love via Gcc-patches

On Thu, 2023-06-15 at 14:23 +0800, Kewen.Lin wrote:
> Hi Carl,
> 
> on 2023/6/15 04:37, Carl Love wrote:
> > Kewen, GCC maintainers:
> > 
> > Version 4, added missing cases for new xxexpqp, xsxexpdp and
> > xsxsigqp
> > cases to rs6000_expand_builtin.  Merged the new define_insn
> > definitions
> > with the existing definitions.  Renamed the builtins by removing
> > the
> > __builtin_ prefix from the names.  Fixed the documentation for the
> > builtins.  Updated the test files to check the desired instructions
> > were generated.  Retested patch on Power 10 with no regressions.
> > 
> > Version 3, was able to get the overloaded version of
> > scalar_insert_exp
> > to work and the change to xsxexpqp_f128_ define instruction
> > to
> > work with the suggestions from Kewen.  
> > 
> > Version 2, I have addressed the various comments from Kewen.  I had
> > issues with adding an additional overloaded version of
> > scalar_insert_exp with vector arguments.  The overload
> > infrastructure
> > didn't work with a mix of scalar and vector arguments.  I did
> > rename
> > the __builtin_insertf128_exp to __builtin_vsx_scalar_insert_exp_qp
> > make
> > it similar to the existing builtin.  I also wasn't able to get the
> > suggested merge of xsxexpqp_f128_ with xsxexpqp_ to
> > work so
> > I left the two simpler definitiions.
> > 
> > The patch add three new builtins to extract the significand and
> > exponent of an IEEE float 128-bit value where the builtin argument
> > is a
> > vector.  Additionally, a builtin to insert the exponent into an
> > IEEE
> > float 128-bit vector argument is added.  These builtins were
> > requested
> > since there is no clean and optimal way to transfer between a
> > vector
> > and a scalar IEEE 128 bit value.
> > 
> > The patch has been tested on Power 10 with no regressions.  Please
> > let
> > me know if the patch is acceptable or not.  Thanks.
> 
> I'd suggest you to test this on P9 BE as well to ensure the test case
> to work well on BE too.

Tested on P9 BE.  Updated test cases for the correct expected BE and LE
results.

> 
> >Carl
> > 
> > 
> > 
> > rs6000: Add builtins for IEEE 128-bit floating point values
> > 
> > Add support for the following builtins:
> > 
> >  __vector unsigned long long int scalar_extract_exp_to_vec
> > (__ieee128);
> >  __vector unsigned __int128 scalar_extract_sig_to_vec (__ieee128);
> >  __ieee128 scalar_insert_exp (__vector unsigned __int128,
> >   __vector unsigned long long);
> > 
> > These builtins were requesed since there is no clean and performant
> > way to
> 
> s/requesed/requested/

Fixed.

> 
> > transfer a value from a vector type and scalar type, despite the
> > fact
> 
> Describe it oppositely?  As the related existing bifs returns scalar
> type,
> the users want them in vector type, so it's "from scalar type to
> vector
> type"?

Updated the description.

> 
> > that they both reside in vector registers.
> 
> the fact is the related hardware insns have vsx registers
> destination.
> 
> > gcc/
> > * config/rs6000/rs6000-builtin.cc (rs6000_expand_builtin):
> > Rename CCDE_FOR_xsxexpqp_tf to CODE_FOR_xsxexpqp_tf_di.
> > Rename CODE_FOR_xsxexpqp_kf to CODE_FOR_xsxexpqp_kf_di.
> > (CODE_FOR_xsxexpqp_kf_v2di, CODE_FOR_xsxsigqp_kf_v1ti,
> > CODE_FOR_xsiexpqp_kf_v2di   ): Add case statements.
> 
> unnecessary tab.

Fixed.

> 
> > * config/rs6000/rs6000-buildin.def (__builtin_extractf128_exp,
> >  __builtin_extractf128_sig, __builtin_insertf128_exp): Add new
> > builtin definitions.
> > Rename xsxexpqp_kf, xsxsigqp_kf, xxsiexpqp_kf to xsexpqp_kf_di,
> 
> typo, xxsiexpqp_kf => xsiexpqp_kf
> 
> > xsxsigqp_kf_ti, xsiexpqp_kf_di respectively.
> > * config/rs6000/rs6000-c.cc
> > (altivec_resolve_overloaded_builtin):
> > Add else if for MODE_VECTOR_INT. Update comments.
> 
> May be better with "Update RS6000_OVLD_VEC_VSIE handling for
> MODE_VECTOR_INT
> which is used for newly added overloaded instance"?

Changed.

> 
> > * config/rs6000/rs6000-overload.def
> > (__builtin_vec_scalar_insert_exp): Add new overload definition
> > with
> > vector arguments.
> > (scalar_extract_exp_to_vec, scalar_extract_sig_to_vec): New
> > odverloaded definitions.
> 
> s/odverloaded/overloaded/

Fixed.

> 
> > * config/vsx.md (VSEEQP_DI, VSESQP_TI): New mode iterators.
> > (VSEEQP_DI_base): New mode attribute definition.
> > Rename xsxexpqp_ to
> > sxexpqp__.
> > Rename xsxsigqp_ to
> > xsxsigqp__.
> > Rename xsiexpqp_ to
> > xsiexpqp__.
> > (xsxsigqp_f128_, xsiexpqpf_f128_): Add define_insn
> > for
> > new builtins.
> > * doc/extend.texi (__builtin_extractf128_exp,
> > __builtin_extractf128_sig): Add documentation for new builtins.
> > (scalar_insert_exp): Add new overloaded builtin definition.
> > 
> > gcc/testsuite/
> > * gcc.target/powerpc/bfp/extract-exp-1.c: New test case.

Re: [PATCH] rs6000, fix vec_replace_unaligned builtin arguments

2023-06-15 Thread Carl Love via Gcc-patches

On Tue, 2023-06-13 at 11:24 +0800, Kewen.Lin wrote:
> Hi Carl,
> 
> on 2023/5/31 04:41, Carl Love wrote:
> > GCC maintainers:
> > 
> > The following patch fixes the first argument in the builtin
> > definition
> > and the corresponding test cases.  Initially, the builtin
> > specification
> > was wrong due to a cut and past error.  The documentation was fixed
> > in:
> > 
> > 
> >commit 8cb748a31cd8c7ac9c88b6abc38ce077dd462a7a
> >Author: Bill Schmidt 
> >Date:   Fri Feb 4 13:26:44 2022 -0600
> > 
> >rs6000: Clean up ISA 3.1 documentation [PR100808]
> > 
> >Due to a pasto error in the documentation,
> > vec_replace_unaligned was
> >implemented with the same function prototypes as
> > vec_replace_elt.  It was
> >intended that vec_replace_unaligned always specify output
> > vectors as having
> >type vector unsigned char, to emphasize that elements are
> > potentially
> >misaligned by this built-in function.  This patch corrects
> > the
> >misimplementation.
> > 
> >2022-02-04  Bill Schmidt  
> > 
> >gcc/
> >PR target/100808
> >* doc/extend.texi (Basic PowerPC Built-in Functions
> > Available on ISA
> >3.1): Provide consistent type names.  Remove
> > unnecessary semicolons.
> >Fix bad line breaks.
> > 
> 
> Wrong referred commit, should be
> ed3fea09b18f67e757b5768b42cb6e816626f1db.
> The above commit used the wrong commit log.

Fixed the commit reference as noted.

> 
> > This patch fixes the arguments in the definitions and updates the
> > testcases accordingly.  Additionally, a few minor spacing issues
> > are
> > fixed.
> > 
> > The patch has been tested on Power 10 with no regressions.  Please
> > let
> > me know if the patch is acceptable for mainline.  Thanks.
> > 
> >  Carl 
> > 
> > --
> > rs6000, fix vec_replace_unaligned builtin arguments
> > 
> > The first argument of the vec_replace_unaligned builtin should
> > always be
> > unsinged char, as specified in gcc/doc/extend.texi.
> 
> s/unsinged/unsigned/

Fixed.

> 
> > This patch fixes the buitin definitions and updates the testcases
> > to use
> 
> s/buitin/builtin/

Fixed.

> 
> > the correct arguments.
> > 
> > gcc/ChangeLog:
> > * config/rs6000/rs6000-overload.def (__builtin_vec_replace_un):
> > Fix first argument type.
> > 
> > gcc/testsuite/ChangeLog:
> > * gcc.target/powerpc/ver-replace-word-runnable.c
> > (vec_replace_unaligned) Fix first argument type.
> > (vresult_uchar): Fix expected   results.
> 
> Nit: unexpected tab.

Fixed.

> 
> > (vec_replace_unaligned): Update for loop to check uchar
> > results.
> > Remove extra spaces in if statements.
> > Insert missing spaces in for statements.
> > (dg-final): Update expected instruction counts.
> > ---
> >  gcc/config/rs6000/rs6000-overload.def |  12 +-
> >  .../powerpc/vec-replace-word-runnable.c   | 157 ++--
> > --
> >  2 files changed, 92 insertions(+), 77 deletions(-)
> > 
> > diff --git a/gcc/config/rs6000/rs6000-overload.def
> > b/gcc/config/rs6000/rs6000-overload.def
> > index c582490c084..26dc662b8fb 100644
> > --- a/gcc/config/rs6000/rs6000-overload.def
> > +++ b/gcc/config/rs6000/rs6000-overload.def
> > @@ -3059,17 +3059,17 @@
> >  VREPLACE_ELT_V2DF
> >  
> >  [VEC_REPLACE_UN, vec_replace_unaligned, __builtin_vec_replace_un]
> > -  vuc __builtin_vec_replace_un (vui, unsigned int, const int);
> > +  vuc __builtin_vec_replace_un (vuc, unsigned int, const int);
> >  VREPLACE_UN_UV4SI
> > -  vuc __builtin_vec_replace_un (vsi, signed int, const int);
> > +  vuc __builtin_vec_replace_un (vuc, signed int, const int);
> >  VREPLACE_UN_V4SI
> > -  vuc __builtin_vec_replace_un (vull, unsigned long long, const
> > int);
> > +  vuc __builtin_vec_replace_un (vuc, unsigned long long, const
> > int);
> >  VREPLACE_UN_UV2DI
> > -  vuc __builtin_vec_replace_un (vsll, signed long long, const
> > int);
> > +  vuc __builtin_vec_replace_un (vuc, signed long long, const int);
> >  VREPLACE_UN_V2DI
> > -  vuc __builtin_vec_replace_un (vf, float, const int);
> > +  vuc __builtin_vec_replace_un (vuc, float, const int);
> >  VREPLACE_UN_V4SF
> > -  vuc __builtin_vec_replace_un (vd, double, const int);
> > +  vuc __builtin_vec_replace_un (vuc, double, const int);
> >  VREPLACE_UN_V2DF
> 
> Looks good, since the given element can be replaced without aligned,
> the given vector type don't need to match the given element, with
> the potential implication that it can be misaligned.
> 
> >  
> >  [VEC_REVB, vec_revb, __builtin_vec_revb]
> > diff --git a/gcc/testsuite/gcc.target/powerpc/vec-replace-word-
> > runnable.c b/gcc/testsuite/gcc.target/powerpc/vec-replace-word-
> > runnable.c
> > index 27318822871..66b0ef58996 100644
> > --- a/gcc/testsuite/gcc.target/powerpc/vec-replace-word-runnable.c
> > +++

[PATCH ver 2] rs6000, fix vec_replace_unaligned builtin arguments

2023-06-15 Thread Carl Love via Gcc-patches

GCC maintainers:

Version 2, fixed various typos.  Updated the change log body to say the
instruction counts were updated.  The instruction counts changed as a
result of changing the first argument of the vec_replace_unaligned
builtin call from vector unsigned long long (vull) to vector unsigned
char (vuc).  When the first argument was vull the builtin call
generated the vinsd instruction for the two test cases.  The updated
call with vuc as the first argument generates two vinsw instructions
instead.  Patch was retested on Power 10 with no regressions.

The following patch fixes the first argument in the builtin definition
and the corresponding test cases.  Initially, the builtin specification
was wrong due to a cut and past error.  The documentation was fixed in:

   commit ed3fea09b18f67e757b5768b42cb6e816626f1db
   Author: Bill Schmidt 
   Date:   Fri Feb 4 13:07:17 2022 -0600

   rs6000: Correct function prototypes for vec_replace_unaligned

   Due to a pasto error in the documentation, vec_replace_unaligned was
   implemented with the same function prototypes as vec_replace_elt.  It was
   intended that vec_replace_unaligned always specify output vectors as 
having
   type vector unsigned char, to emphasize that elements are potentially
   misaligned by this built-in function.  This patch corrects the
   misimplementation.


This patch fixes the arguments in the definitions and updates the
testcases accordingly.  Additionally, a few minor spacing issues are
fixed.

The patch has been tested on Power 10 with no regressions.  Please let
me know if the patch is acceptable for mainline.  Thanks.

 Carl 

--
rs6000, fix vec_replace_unaligned builtin arguments

The first argument of the vec_replace_unaligned builtin should always be
unsigned char, as specified in gcc/doc/extend.texi.

This patch fixes the builtin definitions and updates the testcases to use
the correct arguments.  The expected instruction counts for the testcase
are updated.

gcc/ChangeLog:
* config/rs6000/rs6000-overload.def (__builtin_vec_replace_un):
Fix first argument type.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/ver-replace-word-runnable.c
(vec_replace_unaligned) Fix first argument type.
(vresult_uchar): Fix expected results.
(vec_replace_unaligned): Update for loop to check uchar results.
Remove extra spaces in if statements.
Insert missing spaces in for statements.
(dg-final): Update expected instruction counts.
---
 gcc/config/rs6000/rs6000-overload.def |  12 +-
 .../powerpc/vec-replace-word-runnable.c   | 157 ++
 2 files changed, 92 insertions(+), 77 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def
index c582490c084..26dc662b8fb 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -3059,17 +3059,17 @@
 VREPLACE_ELT_V2DF
 
 [VEC_REPLACE_UN, vec_replace_unaligned, __builtin_vec_replace_un]
-  vuc __builtin_vec_replace_un (vui, unsigned int, const int);
+  vuc __builtin_vec_replace_un (vuc, unsigned int, const int);
 VREPLACE_UN_UV4SI
-  vuc __builtin_vec_replace_un (vsi, signed int, const int);
+  vuc __builtin_vec_replace_un (vuc, signed int, const int);
 VREPLACE_UN_V4SI
-  vuc __builtin_vec_replace_un (vull, unsigned long long, const int);
+  vuc __builtin_vec_replace_un (vuc, unsigned long long, const int);
 VREPLACE_UN_UV2DI
-  vuc __builtin_vec_replace_un (vsll, signed long long, const int);
+  vuc __builtin_vec_replace_un (vuc, signed long long, const int);
 VREPLACE_UN_V2DI
-  vuc __builtin_vec_replace_un (vf, float, const int);
+  vuc __builtin_vec_replace_un (vuc, float, const int);
 VREPLACE_UN_V4SF
-  vuc __builtin_vec_replace_un (vd, double, const int);
+  vuc __builtin_vec_replace_un (vuc, double, const int);
 VREPLACE_UN_V2DF
 
 [VEC_REVB, vec_revb, __builtin_vec_revb]
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-replace-word-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/vec-replace-word-runnable.c
index 27318822871..66b0ef58996 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-replace-word-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-replace-word-runnable.c
@@ -20,6 +20,9 @@ main (int argc, char *argv [])
   unsigned char ch;
   unsigned int index;
 
+  vector unsigned char src_va_uchar;
+  vector unsigned char expected_vresult_uchar;
+
   vector unsigned int vresult_uint;
   vector unsigned int expected_vresult_uint;
   vector unsigned int src_va_uint;
@@ -64,10 +67,10 @@ main (int argc, char *argv [])
 
   vresult_uint = vec_replace_elt (src_va_uint, src_a_uint, 2);
 
-  if (!vec_all_eq (vresult_uint,  expected_vresult_uint)) {
+  if (!vec_all_eq (vresult_uint, expected_vresult_uint)) {
 #if DEBUG
 printf("ERROR,

[PATCH ver 4] rs6000: Add builtins for IEEE 128-bit floating point values

2023-06-14 Thread Carl Love via Gcc-patches

Kewen, GCC maintainers:

Version 4, added missing cases for new xxexpqp, xsxexpdp and xsxsigqp
cases to rs6000_expand_builtin.  Merged the new define_insn definitions
with the existing definitions.  Renamed the builtins by removing the
__builtin_ prefix from the names.  Fixed the documentation for the
builtins.  Updated the test files to check the desired instructions
were generated.  Retested patch on Power 10 with no regressions.

Version 3, was able to get the overloaded version of scalar_insert_exp
to work and the change to xsxexpqp_f128_ define instruction to
work with the suggestions from Kewen.  

Version 2, I have addressed the various comments from Kewen.  I had
issues with adding an additional overloaded version of
scalar_insert_exp with vector arguments.  The overload infrastructure
didn't work with a mix of scalar and vector arguments.  I did rename
the __builtin_insertf128_exp to __builtin_vsx_scalar_insert_exp_qp make
it similar to the existing builtin.  I also wasn't able to get the
suggested merge of xsxexpqp_f128_ with xsxexpqp_ to work so
I left the two simpler definitiions.

The patch add three new builtins to extract the significand and
exponent of an IEEE float 128-bit value where the builtin argument is a
vector.  Additionally, a builtin to insert the exponent into an IEEE
float 128-bit vector argument is added.  These builtins were requested
since there is no clean and optimal way to transfer between a vector
and a scalar IEEE 128 bit value.

The patch has been tested on Power 10 with no regressions.  Please let
me know if the patch is acceptable or not.  Thanks.

   Carl



rs6000: Add builtins for IEEE 128-bit floating point values

Add support for the following builtins:

 __vector unsigned long long int scalar_extract_exp_to_vec (__ieee128);
 __vector unsigned __int128 scalar_extract_sig_to_vec (__ieee128);
 __ieee128 scalar_insert_exp (__vector unsigned __int128,
  __vector unsigned long long);

These builtins were requesed since there is no clean and performant way to
transfer a value from a vector type and scalar type, despite the fact
that they both reside in vector registers.

gcc/
* config/rs6000/rs6000-builtin.cc (rs6000_expand_builtin):
Rename CCDE_FOR_xsxexpqp_tf to CODE_FOR_xsxexpqp_tf_di.
Rename CODE_FOR_xsxexpqp_kf to CODE_FOR_xsxexpqp_kf_di.
(CODE_FOR_xsxexpqp_kf_v2di, CODE_FOR_xsxsigqp_kf_v1ti,
CODE_FOR_xsiexpqp_kf_v2di   ): Add case statements.
* config/rs6000/rs6000-buildin.def (__builtin_extractf128_exp,
 __builtin_extractf128_sig, __builtin_insertf128_exp): Add new
builtin definitions.
Rename xsxexpqp_kf, xsxsigqp_kf, xxsiexpqp_kf to xsexpqp_kf_di,
xsxsigqp_kf_ti, xsiexpqp_kf_di respectively.
* config/rs6000/rs6000-c.cc (altivec_resolve_overloaded_builtin):
Add else if for MODE_VECTOR_INT. Update comments.
* config/rs6000/rs6000-overload.def
(__builtin_vec_scalar_insert_exp): Add new overload definition with
vector arguments.
(scalar_extract_exp_to_vec, scalar_extract_sig_to_vec): New
odverloaded definitions.
* config/vsx.md (VSEEQP_DI, VSESQP_TI): New mode iterators.
(VSEEQP_DI_base): New mode attribute definition.
Rename xsxexpqp_ to sxexpqp__.
Rename xsxsigqp_ to xsxsigqp__.
Rename xsiexpqp_ to xsiexpqp__.
(xsxsigqp_f128_, xsiexpqpf_f128_): Add define_insn for
new builtins.
* doc/extend.texi (__builtin_extractf128_exp,
__builtin_extractf128_sig): Add documentation for new builtins.
(scalar_insert_exp): Add new overloaded builtin definition.

gcc/testsuite/
* gcc.target/powerpc/bfp/extract-exp-1.c: New test case.
* gcc.target/powerpc/bfp/extract-sig-1.c: New test case.
* gcc.target/powerpc/bfp/insert-exp-1.c: New test case.
---
 gcc/config/rs6000/rs6000-builtin.cc   | 21 +++--
 gcc/config/rs6000/rs6000-builtins.def | 15 ++-
 gcc/config/rs6000/rs6000-c.cc | 10 +-
 gcc/config/rs6000/rs6000-overload.def | 10 ++
 gcc/config/rs6000/vsx.md  | 26 +++--
 gcc/doc/extend.texi   | 21 -
 .../gcc.target/powerpc/bfp/extract-exp-1.c| 53 +++
 .../gcc.target/powerpc/bfp/extract-sig-1.c| 60 
 .../gcc.target/powerpc/bfp/insert-exp-1.c | 94 +++
 9 files changed, 284 insertions(+), 26 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/bfp/extract-exp-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/bfp/extract-sig-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/bfp/insert-exp-1.c

diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
b/gcc/config/rs6000/rs6000-builtin.cc
index 534698e7d3e..a8f291c6a72 100644
--- a/gcc/config/rs6000/rs6000-builtin.cc
+++ b/gcc/config/rs6000/rs6000-builtin.cc
@@

Re: [PATCH ver 3] rs6000: Add builtins for IEEE 128-bit floating point values

2023-06-14 Thread Carl Love via Gcc-patches

On Tue, 2023-06-13 at 11:10 +0800, Kewen.Lin wrote:
> Hi Carl,
> 
> on 2023/6/8 23:21, Carl Love wrote:
> > Kewen, GCC maintainers:
> > 
> > Version 3, was able to get the overloaded version of
> > scalar_insert_exp
> > to work and the change to xsxexpqp_f128_ define instruction
> > to
> > work with the suggestions from Kewen.  
> > 
> > Version 2, I have addressed the various comments from Kewen.  I had
> > issues with adding an additional overloaded version of
> > scalar_insert_exp with vector arguments.  The overload
> > infrastructure
> > didn't work with a mix of scalar and vector arguments.  I did
> > rename
> > the __builtin_insertf128_exp to __builtin_vsx_scalar_insert_exp_qp
> > make
> > it similar to the existing builtin.  I also wasn't able to get the
> > suggested merge of xsxexpqp_f128_ with xsxexpqp_ to
> > work so
> > I left the two simpler definitiions.
> > 
> > The patch add three new builtins to extract the significand and
> > exponent of an IEEE float 128-bit value where the builtin argument
> > is a
> > vector.  Additionally, a builtin to insert the exponent into an
> > IEEE
> > float 128-bit vector argument is added.  These builtins were
> > requested
> > since there is no clean and optimal way to transfer between a
> > vector
> > and a scalar IEEE 128 bit value.
> > 
> > The patch has been tested on Power 10 with no regressions.  Please
> > let
> > me know if the patch is acceptable or not.  Thanks.
> > 
> >Carl
> > 
> > ---
> > rs6000: Add builtins for IEEE 128-bit floating point values
> > 
> > Add support for the following builtins:
> > 
> >  __vector unsigned long long int
> >  __builtin_scalar_extract_exp_to_vec (__ieee128);
> >  __vector unsigned __int128
> >  __builtin_scalar_extract_sig_to_vec (__ieee128);
> >  __ieee128 scalar_insert_exp (__vector unsigned __int128,
> >   __vector unsigned long long);

Fixed commit log, removed __builtin_ from the names per comments from
Kewen below.
> > 
> > These builtins were requesed since there is no clean and performant
> > way to
> > transfer a value from a vector type and scalar type, despite the
> > fact
> > that they both reside in vector registers.
> > 
> > gcc/
> > * config/rs6000/rs6000-builtin.cc (rs6000_expand_builtin):
> > Rename CCDE_FOR_xsxexpqp_tf to CODE_FOR_xsxexpqp_tf_di.
> > Rename CODE_FOR_xsxexpqp_kf to CODE_FOR_xsxexpqp_kf_di.
> > * config/rs6000/rs6000-buildin.def (__builtin_extractf128_exp,
> >  __builtin_extractf128_sig, __builtin_insertf128_exp): Add new
> > builtin definitions.
> > Rename xsxexpqp_kf to xsxexpqp_kf_di.
> > * config/rs6000/rs6000-c.cc
> > (altivec_resolve_overloaded_builtin):
> > Add else if for MODE_VECTOR_INT. Update comments.
> > * config/rs6000/rs6000-overload.def
> > (__builtin_vec_scalar_insert_exp): Add new overload definition
> > with
> > vector arguments.
> > * config/vsx.md (VSEEQP_DI): New mode iterator.
> > Rename define_insn xsxexpqp_ to
> > sxexpqp__.
> > (xsxsigqp_f128_, xsiexpqpf_f128_): Add define_insn
> > for
> > new builtins.
> > * doc/extend.texi (__builtin_extractf128_exp,
> > __builtin_extractf128_sig): Add documentation for new builtins.
> > (scalar_insert_exp): Add new overloaded builtin definition.
> > 
> > gcc/testsuite/
> > * gcc.target/powerpc/bfp/extract-exp-ieee128.c: New test case.
> > * gcc.target/powerpc/bfp/extract-sig-ieee128.c: New test case.
> > * gcc.target/powerpc/bfp/insert-exp-ieee128.c: New test case.
> > ---
> >  gcc/config/rs6000/rs6000-builtin.cc   |  4 +-
> >  gcc/config/rs6000/rs6000-builtins.def | 11 ++-
> >  gcc/config/rs6000/rs6000-c.cc | 10 +-
> >  gcc/config/rs6000/rs6000-overload.def |  2 +
> >  gcc/config/rs6000/vsx.md  | 28 +-
> >  gcc/doc/extend.texi   |  9 ++
> >  .../powerpc/bfp/extract-exp-ieee128.c | 50 ++
> >  .../powerpc/bfp/extract-sig-ieee128.c | 57 
> >  .../powerpc/bfp/insert-exp-ieee128.c  | 91
> > +++
> >  9 files changed, 253 insertions(+), 9 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/bfp/extract-
> > exp-ieee128.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/bfp/extract-
> > sig-ieee128.c
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/bfp/insert-
> > exp-ieee128.c
> > 
> > diff --git a/gcc/config/rs6000/rs6000-builtin.cc
> > b/gcc/config/rs6000/rs6000-builtin.cc
> > index 534698e7d3e..d99f0ae5dda 100644
> > --- a/gcc/config/rs6000/rs6000-builtin.cc
> > +++ b/gcc/config/rs6000/rs6000-builtin.cc
> > @@ -3326,8 +3326,8 @@ rs6000_expand_builtin (tree exp, rtx target,
> > rtx /* subtarget */,
> >case CODE_FOR_fmakf4_odd:
> > icode = CODE_FOR_fmatf4_odd;
> > break;
> > -  case CODE_FOR_xsxexpqp_kf:
> > -   icode = CODE_FOR_xsxexpqp_tf;
> > +

Re: [PATCH] rs6000: Add builtins for IEEE 128-bit floating point values

2023-06-08 Thread Carl Love via Gcc-patches



Kewen:
On Wed, 2023-06-07 at 17:36 +0800, Kewen.Lin wrote:
> Hi,
> 
> on 2023/6/7 03:54, Carl Love wrote:
> > On Mon, 2023-06-05 at 16:45 +0800, Kewen.Lin wrote:
> > > Hi Carl,
> > > 
> > > on 2023/5/2 23:52, Carl Love via Gcc-patches wrote:
> > > > GCC maintainers:
> > > > 
> > > > The following patch adds three buitins for inserting and
> > > > extracting
> > > > the
> > > > exponent and significand for an IEEE 128-bit floating point
> > > > values. 
> > > > The builtins are valid for Power 9 and Power 10.  
> > > 
> > > We already have:
> > > 
> > > unsigned long long int scalar_extract_exp (__ieee128 source);
> > > unsigned __int128 scalar_extract_sig (__ieee128 source);
> > > ieee_128 scalar_insert_exp (unsigned __int128 significand,
> > > unsigned long long int exponent);
> > > ieee_128 scalar_insert_exp (ieee_128 significand, unsigned long
> > > long
> > > int exponent);
> > > 
> > > you need to say something about the requirements or the
> > > justification
> > > for
> > > adding more, for this patch itself, some comments are inline
> > > below,
> > > thanks!
> > 
> > I implemented the patch based on a request for the builtins.  It
> > didn't
> > include any justification so I reached out to Steve Monroe who
> > requested the builtins to understand why he wanted them.  Here is
> > his
> > reply:
> > 
> >Basically there is no clean and performant way to transfer
> > between a
> >vector type and the ieee128 scalar, despite the fact that both
> >reside in vector registers. Also a union transfer does not work
> >correctly on most GCC versions (and will likely break again in
> > the
> >next release). I offer the long sad history of the IBM long
> > double
> >float runtime.
> 
> Thanks for clarifying this.  As the proposed changes, I think he
> meant
> to say "Basically there is no clean and performant way to transfer
> between
> a vector type and the scalar **types**". :) Because the proposed
> changes
> are:
>   scalar_extract_exp: unsigned long long => vector unsigned long long
>   scalar_extract_sig: unsigned __int128  => vector unsigned __int128
>   scalar_insert_exp: unsigned __int128 => vector unsigned __int128
>  unsigned long long => vector unsigned long long.
> 
> >Also there are __ieee128 operations that are provided by
> > builtins
> >for POWER9 but are not provided by libgcc (for POWER8).
> > 
> >Finally I can prove that a softfloat __ieee128 implementation
> > using
> >VMX integer operations, out-performs the current libgcc
> >implementation using DW GPRs.
> > 
> >The details are in the PVECLIB documentation
> >pveclib/vec__f128__ppc.h
> > 
> > 
> > > > The patch has been tested on both Power 9 and Power 10.
> > > > 
> > > > Please let me know if this patch is acceptable for
> > > > mainline.  Thanks.
> > > > 
> > > > Carl 
> > > > 
> > > > 
> > > > --
> > > > From a20cc81f98cce1140fc95775a7c25b55d1ca7cba Mon Sep 17
> > > > 00:00:00
> > > > 2001
> > > > From: Carl Love 
> > > > Date: Wed, 12 Apr 2023 17:46:37 -0400
> > > > Subject: [PATCH] rs6000: Add builtins for IEEE 128-bit floating
> > > > point values
> > > > 
> > > > Add support for the following builtins:
> > > > 
> > > >  __vector unsigned long long int __builtin_extractf128_exp
> > > > (__ieee128);
> > > 
> > > Could you make the name similar to the existing one?  The
> > > existing
> > > one
> > >   
> > >   unsigned long long int scalar_extract_exp (__ieee128 source);
> > > 
> > > has nothing like f128 on its name, this variant is just to change
> > > the
> > > return type to vector type, how about scalar_extract_exp_to_vec?
> > 
> > I changed the name  __builtin_extractf128_exp  to
> > __builtin_scalar_extract_exp_to_vec.
> > 
> > > >  __vector unsigned __int128 __builtin_extractf128_sig
> > > > (__ieee128);
> > > 
> > > Ditto.
> > 
> > I changed the name  __builtin_extractf128_sig to
> > __builtin_scalar_extract_sig_to_vec.
> > 
> > > &g

[PATCH ver 3] rs6000: Add builtins for IEEE 128-bit floating point values

2023-06-08 Thread Carl Love via Gcc-patches

Kewen, GCC maintainers:

Version 3, was able to get the overloaded version of scalar_insert_exp
to work and the change to xsxexpqp_f128_ define instruction to
work with the suggestions from Kewen.  

Version 2, I have addressed the various comments from Kewen.  I had
issues with adding an additional overloaded version of
scalar_insert_exp with vector arguments.  The overload infrastructure
didn't work with a mix of scalar and vector arguments.  I did rename
the __builtin_insertf128_exp to __builtin_vsx_scalar_insert_exp_qp make
it similar to the existing builtin.  I also wasn't able to get the
suggested merge of xsxexpqp_f128_ with xsxexpqp_ to work so
I left the two simpler definitiions.

The patch add three new builtins to extract the significand and
exponent of an IEEE float 128-bit value where the builtin argument is a
vector.  Additionally, a builtin to insert the exponent into an IEEE
float 128-bit vector argument is added.  These builtins were requested
since there is no clean and optimal way to transfer between a vector
and a scalar IEEE 128 bit value.

The patch has been tested on Power 10 with no regressions.  Please let
me know if the patch is acceptable or not.  Thanks.

   Carl

---
rs6000: Add builtins for IEEE 128-bit floating point values

Add support for the following builtins:

 __vector unsigned long long int
 __builtin_scalar_extract_exp_to_vec (__ieee128);
 __vector unsigned __int128
 __builtin_scalar_extract_sig_to_vec (__ieee128);
 __ieee128 scalar_insert_exp (__vector unsigned __int128,
  __vector unsigned long long);

These builtins were requesed since there is no clean and performant way to
transfer a value from a vector type and scalar type, despite the fact
that they both reside in vector registers.

gcc/
* config/rs6000/rs6000-builtin.cc (rs6000_expand_builtin):
Rename CCDE_FOR_xsxexpqp_tf to CODE_FOR_xsxexpqp_tf_di.
Rename CODE_FOR_xsxexpqp_kf to CODE_FOR_xsxexpqp_kf_di.
* config/rs6000/rs6000-buildin.def (__builtin_extractf128_exp,
 __builtin_extractf128_sig, __builtin_insertf128_exp): Add new
builtin definitions.
Rename xsxexpqp_kf to xsxexpqp_kf_di.
* config/rs6000/rs6000-c.cc (altivec_resolve_overloaded_builtin):
Add else if for MODE_VECTOR_INT. Update comments.
* config/rs6000/rs6000-overload.def
(__builtin_vec_scalar_insert_exp): Add new overload definition with
vector arguments.
* config/vsx.md (VSEEQP_DI): New mode iterator.
Rename define_insn xsxexpqp_ to
sxexpqp__.
(xsxsigqp_f128_, xsiexpqpf_f128_): Add define_insn for
new builtins.
* doc/extend.texi (__builtin_extractf128_exp,
__builtin_extractf128_sig): Add documentation for new builtins.
(scalar_insert_exp): Add new overloaded builtin definition.

gcc/testsuite/
* gcc.target/powerpc/bfp/extract-exp-ieee128.c: New test case.
* gcc.target/powerpc/bfp/extract-sig-ieee128.c: New test case.
* gcc.target/powerpc/bfp/insert-exp-ieee128.c: New test case.
---
 gcc/config/rs6000/rs6000-builtin.cc   |  4 +-
 gcc/config/rs6000/rs6000-builtins.def | 11 ++-
 gcc/config/rs6000/rs6000-c.cc | 10 +-
 gcc/config/rs6000/rs6000-overload.def |  2 +
 gcc/config/rs6000/vsx.md  | 28 +-
 gcc/doc/extend.texi   |  9 ++
 .../powerpc/bfp/extract-exp-ieee128.c | 50 ++
 .../powerpc/bfp/extract-sig-ieee128.c | 57 
 .../powerpc/bfp/insert-exp-ieee128.c  | 91 +++
 9 files changed, 253 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/bfp/extract-exp-ieee128.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/bfp/extract-sig-ieee128.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/bfp/insert-exp-ieee128.c

diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
b/gcc/config/rs6000/rs6000-builtin.cc
index 534698e7d3e..d99f0ae5dda 100644
--- a/gcc/config/rs6000/rs6000-builtin.cc
+++ b/gcc/config/rs6000/rs6000-builtin.cc
@@ -3326,8 +3326,8 @@ rs6000_expand_builtin (tree exp, rtx target, rtx /* 
subtarget */,
   case CODE_FOR_fmakf4_odd:
icode = CODE_FOR_fmatf4_odd;
break;
-  case CODE_FOR_xsxexpqp_kf:
-   icode = CODE_FOR_xsxexpqp_tf;
+  case CODE_FOR_xsxexpqp_kf_di:
+   icode = CODE_FOR_xsxexpqp_tf_di;
break;
   case CODE_FOR_xsxsigqp_kf:
icode = CODE_FOR_xsxsigqp_tf;
diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 638d0bc72ca..dcd4a393906 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -2901,8 +2901,14 @@
   fpmath double __builtin_truncf128_round_to_odd (_Float128);
 TRUNCF128_ODD trunckfdf2_odd {}
 
+  vull __builtin_scalar_extract_exp_to_vec

Re: [PATCH] rs6000: Add builtins for IEEE 128-bit floating point values

2023-06-06 Thread Carl Love via Gcc-patches

On Mon, 2023-06-05 at 16:45 +0800, Kewen.Lin wrote:
> Hi Carl,
> 
> on 2023/5/2 23:52, Carl Love via Gcc-patches wrote:
> > GCC maintainers:
> > 
> > The following patch adds three buitins for inserting and extracting
> > the
> > exponent and significand for an IEEE 128-bit floating point
> > values. 
> > The builtins are valid for Power 9 and Power 10.  
> 
> We already have:
> 
> unsigned long long int scalar_extract_exp (__ieee128 source);
> unsigned __int128 scalar_extract_sig (__ieee128 source);
> ieee_128 scalar_insert_exp (unsigned __int128 significand,
> unsigned long long int exponent);
> ieee_128 scalar_insert_exp (ieee_128 significand, unsigned long long
> int exponent);
> 
> you need to say something about the requirements or the justification
> for
> adding more, for this patch itself, some comments are inline below,
> thanks!

I implemented the patch based on a request for the builtins.  It didn't
include any justification so I reached out to Steve Monroe who
requested the builtins to understand why he wanted them.  Here is his
reply:

   Basically there is no clean and performant way to transfer between a
   vector type and the ieee128 scalar, despite the fact that both
   reside in vector registers. Also a union transfer does not work
   correctly on most GCC versions (and will likely break again in the
   next release). I offer the long sad history of the IBM long double
   float runtime.

   Also there are __ieee128 operations that are provided by builtins
   for POWER9 but are not provided by libgcc (for POWER8).

   Finally I can prove that a softfloat __ieee128 implementation using
   VMX integer operations, out-performs the current libgcc
   implementation using DW GPRs.

   The details are in the PVECLIB documentation
   pveclib/vec__f128__ppc.h

> 
> > The patch has been tested on both Power 9 and Power 10.
> > 
> > Please let me know if this patch is acceptable for
> > mainline.  Thanks.
> > 
> > Carl 
> > 
> > 
> > --
> > From a20cc81f98cce1140fc95775a7c25b55d1ca7cba Mon Sep 17 00:00:00
> > 2001
> > From: Carl Love 
> > Date: Wed, 12 Apr 2023 17:46:37 -0400
> > Subject: [PATCH] rs6000: Add builtins for IEEE 128-bit floating
> > point values
> > 
> > Add support for the following builtins:
> > 
> >  __vector unsigned long long int __builtin_extractf128_exp
> > (__ieee128);
> 
> Could you make the name similar to the existing one?  The existing
> one
>   
>   unsigned long long int scalar_extract_exp (__ieee128 source);
> 
> has nothing like f128 on its name, this variant is just to change the
> return type to vector type, how about scalar_extract_exp_to_vec?

I changed the name  __builtin_extractf128_exp  to
__builtin_scalar_extract_exp_to_vec.

> 
> >  __vector unsigned __int128 __builtin_extractf128_sig (__ieee128);
> 
> Ditto.

I changed the name  __builtin_extractf128_sig to
__builtin_scalar_extract_sig_to_vec.

> 
> >  __ieee128 __builtin_insertf128_exp (__vector unsigned __int128,
> >  __vector unsigned long long);
> 
> This one can just overload the existing scalar_insert_exp?

I tried making this one an overloaded version of
scalar_insert_exp.  However, the overload with the vector arguments
isn't recognized when I put the overload definition at the end of the
list of overloads.  When I tried putting the vector version as the
first overloaded definition, I get an internal error
on  __builtin_vsx_scalar_insert_exp_q which is has the same arguments
types but as scalars not vectors.  Best I can tell, there is an issue
with mixing scalar and vector arguments in an overloaded builtin.  

I renamed __builtin_insertf128_exp as
__builtin_vsx_scalar_insert_exp_vqp which is just the vector version of
  the existing __builtin_vsx_scalar_insert_exp_qp builtin.
> 
> gcc/
> > * config/rs6000/rs6000-buildin.def (__builtin_extractf128_exp,
> >  __builtin_extractf128_sig, __builtin_insertf128_exp): Add new
> > builtin definitions.
> > * config/rs6000.md (extractf128_exp_,
> > insertf128_exp_,
> > extractf128_sig_): Add define_expand for new builtins.
> > (xsxexpqp_f128_, xsxsigqp_f128_,
> > siexpqpf_f128_):
> > Add define_insn for new builtins.
> > * doc/extend.texi (__builtin_extractf128_exp,
> > __builtin_extractf128_sig,
> > __builtin_insertf128_exp): Add documentation for new builtins.
> > 
> > gcc/testsuite/
> > * gcc.target/powerpc/bfp/extract-exp-ieee128.c: New test case.
> > * gcc.target/powerpc/bfp/extract-sig-ieee128.c: New test case.
> > * gcc.target/powe

[PATCH ver 2] rs6000: Add builtins for IEEE 128-bit floating point values

2023-06-06 Thread Carl Love via Gcc-patches

Kewen, GCC maintainers:

Version 2, I have addressed the various comments from Kewen.  I had
issues with adding an additional overloaded version of
scalar_insert_exp with vector arguments.  The overload infrastructure
didn't work with a mix of scalar and vector arguments.  I did rename
the __builtin_insertf128_exp to __builtin_vsx_scalar_insert_exp_qp make
it similar to the existing builtin.  I also wasn't able to get the
suggested merge of xsxexpqp_f128_ with xsxexpqp_ to work so
I left the two simpler definitiions.

The patch add three new builtins to extract the significand and
exponent of an IEEE float 128-bit value where the builtin argument is a
vector.  Additionally, a builtin to insert the exponent into an IEEE
float 128-bit vector argument is added.  These builtins were requested
since there is no clean and optimal way to transfer between a vector
and a scalar IEEE 128 bit value.

The patch has been tested on Power 10 with no regressions.  Please let
me know if the patch is acceptable or not.  Thanks.

   Carl

---
rs6000: Add builtins for IEEE 128-bit floating point values

Add support for the following builtins:

 __vector unsigned long long int
 __builtin_scalar_extract_exp_to_vec (__ieee128);
 __vector unsigned __int128
 __builtin_scalar_extract_sig_to_vec (__ieee128);
 __ieee128 __builtin_vsx_scalar_insert_exp_vqp (__vector unsigned __int128,
 __vector unsigned long long);

These builtins were requesed since there is no clean and performant way to
transfer between a vector type and the ieee128 scalar, despite the fact
that both reside in vector registers. Also a union transfer does not work
correctly on most GCC versions.

gcc/
* config/rs6000/rs6000-buildin.def (__builtin_extractf128_exp,
 __builtin_extractf128_sig, __builtin_insertf128_exp): Add new
builtin definitions.
* config/rs6000.md (extractf128_exp_, insertf128_exp_,
extractf128_sig_): Add define_expand for new builtins.
(xsxexpqp_f128_, xsxsigqp_f128_, siexpqpf_f128_):
Add define_insn for new builtins.
* doc/extend.texi (__builtin_extractf128_exp, __builtin_extractf128_sig,
__builtin_insertf128_exp): Add documentation for new builtins.

gcc/testsuite/
* gcc.target/powerpc/bfp/extract-exp-ieee128.c: New test case.
* gcc.target/powerpc/bfp/extract-sig-ieee128.c: New test case.
* gcc.target/powerpc/bfp/insert-exp-ieee128.c: New test case.
---
 gcc/config/rs6000/rs6000-builtins.def |  9 +++
 gcc/config/rs6000/rs6000-overload.def |  2 +
 gcc/config/rs6000/vsx.md  | 31 +-
 gcc/doc/extend.texi   | 10 
 .../powerpc/bfp/extract-exp-ieee128.c | 50 
 .../powerpc/bfp/extract-sig-ieee128.c | 57 ++
 .../powerpc/bfp/insert-exp-ieee128.c  | 58 +++
 7 files changed, 216 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/bfp/extract-exp-ieee128.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/bfp/extract-sig-ieee128.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/bfp/insert-exp-ieee128.c

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 638d0bc72ca..92f22481687 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -2901,6 +2901,12 @@
   fpmath double __builtin_truncf128_round_to_odd (_Float128);
 TRUNCF128_ODD trunckfdf2_odd {}
 
+  vull __builtin_scalar_extract_exp_to_vec (_Float128);
+EEXPKF xsxexpqp_f128_kf {}
+
+  vuq __builtin_scalar_extract_sig_to_vec (_Float128);
+ESIGKF xsxsigqp_f128_kf {}
+
   const signed long long __builtin_vsx_scalar_extract_expq (_Float128);
 VSEEQP xsxexpqp_kf {}
 
@@ -2915,6 +2921,9 @@
   unsigned long long);
 VSIEQPF xsiexpqpf_kf {}
 
+  const _Float128 __builtin_vsx_scalar_insert_exp_vqp (vuq, vull);
+VSIEDP_VULL xsiexpqpf_f128_kf {}
+
   const signed int __builtin_vsx_scalar_test_data_class_qp (_Float128, \
 const int<7>);
 VSTDCQP xststdcqp_kf {}
diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def
index c582490c084..102ead9f80b 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -4515,6 +4515,8 @@
 VSIEQP
   _Float128 __builtin_vec_scalar_insert_exp (_Float128, unsigned long long);
 VSIEQPF
+  _Float128 __builtin_vsx_scalar_insert_exp_vqp (vuq, vull);
+VSIEDP_VULL
 
 [VEC_VSTDC, scalar_test_data_class, __builtin_vec_scalar_test_data_class]
   unsigned int __builtin_vec_scalar_test_data_class (float, const int);
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 7d845df5c2d..0f6df4bbcf5 100644

rs6000: Fix expected counts powerpc/p9-vec-length-full

2023-06-01 Thread Carl Love via Gcc-patches



GCC maintainers:

The following patch updates the expected instruction counts in four
tests.  The counts in all of the tests changed with commit
f574e2dfae79055f16d0c63cc12df24815d8ead6.  

The updated counts have been verified on both Power 9 and Power 10.

Please let me know if this patch is acceptable for mainline.  Thanks.

  Carl 


rs6000: Fix expected counts powerpc/p9-vec-length-full tests

The counts for instructions lxvl and stxvl in tests:

  p9-vec-length-full-1.c
  p9-vec-length-full-2.c
  p9-vec-length-full-6.c
  p9-vec-length-full-7.c

changed with commit:

   commit f574e2dfae79055f16d0c63cc12df24815d8ead6
   Author: Ju-Zhe Zhong 
   Date:   Thu May 25 22:42:35 2023 +0800

 VECT: Add decrement IV iteration loop control by variable amount support

 This patch is supporting decrement IV by following the flow designed by
 Richard:
   ...

The expected counts for lxvl changed from 20 to 40 and the counts for stxvl
changed from 10 to 20 in the first three tests.  The number of stxvl
instructions changed from 12 to 20 in p9-vec-length-full-7.c.  This
patch updates the number of expected instructions in the four tests.

The counts have been verified on Power 9 and Power 10.
---
 gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-1.c | 4 ++--
 gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-2.c | 4 ++--
 gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-6.c | 4 ++--
 gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-7.c | 2 +-
 4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-1.c 
b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-1.c
index f01f1c54fa5..5e4f34421d3 100644
--- a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-1.c
+++ b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-1.c
@@ -12,5 +12,5 @@
 /* { dg-final { scan-assembler-not   {\mstxv\M} } } */
 /* { dg-final { scan-assembler-not   {\mlxvx\M} } } */
 /* { dg-final { scan-assembler-not   {\mstxvx\M} } } */
-/* { dg-final { scan-assembler-times {\mlxvl\M} 20 } } */
-/* { dg-final { scan-assembler-times {\mstxvl\M} 10 } } */
+/* { dg-final { scan-assembler-times {\mlxvl\M} 40 } } */
+/* { dg-final { scan-assembler-times {\mstxvl\M} 20 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-2.c 
b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-2.c
index f546e97fa7d..c7d927382c3 100644
--- a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-2.c
+++ b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-2.c
@@ -12,5 +12,5 @@
 /* { dg-final { scan-assembler-not   {\mstxv\M} } } */
 /* { dg-final { scan-assembler-not   {\mlxvx\M} } } */
 /* { dg-final { scan-assembler-not   {\mstxvx\M} } } */
-/* { dg-final { scan-assembler-times {\mlxvl\M} 20 } } */
-/* { dg-final { scan-assembler-times {\mstxvl\M} 10 } } */
+/* { dg-final { scan-assembler-times {\mlxvl\M} 40 } } */
+/* { dg-final { scan-assembler-times {\mstxvl\M} 20 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-6.c 
b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-6.c
index 65ddf2b098a..f3be3842c62 100644
--- a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-6.c
+++ b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-6.c
@@ -11,5 +11,5 @@
 /* It can use normal vector load for constant vector load.  */
 /* { dg-final { scan-assembler-times {\mstxvx?\M} 6 } } */
 /* 64bit/32bit pairs won't use partial vectors.  */
-/* { dg-final { scan-assembler-times {\mlxvl\M} 10 } } */
-/* { dg-final { scan-assembler-times {\mstxvl\M} 10 } } */
+/* { dg-final { scan-assembler-times {\mlxvl\M} 20 } } */
+/* { dg-final { scan-assembler-times {\mstxvl\M} 20 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-7.c 
b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-7.c
index e0e51d9a972..da086f1826a 100644
--- a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-7.c
+++ b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-7.c
@@ -12,4 +12,4 @@
 
 /* Each type has one stxvl excepting for int8 and uint8, that have two due to
rtl pass bbro duplicating the block which has one stxvl.  */
-/* { dg-final { scan-assembler-times {\mstxvl\M} 12 } } */
+/* { dg-final { scan-assembler-times {\mstxvl\M} 20 } } */
-- 
2.37.2

[PATCH] rs6000: Fix arguments for __builtin_altivec_tr_stxvrwx, __builtin_altivec_tr_stxvrhx

2023-06-01 Thread Carl Love via Gcc-patches

Kewen, Segher, Peter:

The following patch is a redo of the previous "rs6000: Fix
__builtin_vec_xst_trunc definition" patch.  

This patch fixes the argument in the two builtin definitions
__builtin_altivec_tr_stxvrwx and __builtin_altivec_tr_stxvrhx.  It also
adds with a testcase to validate the related builtins which have the
third argument of char *, short *, int * and long long *.

I have tested the patch on Power 10 with no regressions.

Please let me know if this patch is acceptable for mainline.

  Carl 


rs6000: Fix arguments for __builtin_altivec_tr_stxvrwx, 
__builtin_altivec_tr_stxvrhx

The third argument for __builtin_altivec_tr_stxvrhx should be short *
not int *.  Similarly, the third argument for __builtin_altivec_tr_stxvrwx
should be int * not short *.  This patch fixes the arguments in the two
builtins.

A runnable test case is added to test the __builtin_altivec_tr_stxvrbx,
__builtin_altivec_tr_stxvrhx, __builtin_altivec_tr_stxvrwx and
__builtin_altivec_tr_stxvrdx builtins.

gcc/
* config/rs6000/rs6000-builtins.def (__builtin_altivec_tr_stxvrhx,
__builtin_altivec_tr_stxvrwx): Fix type of third argument.

gcc/testsuite/
* gcc.target/powerpc/builtin_altivec_tr_stxvr_runnable.c: New test
for __builtin_altivec_tr_stxvrbx, __builtin_altivec_tr_stxvrhx,
__builtin_altivec_tr_stxvrwx, __builtin_altivec_tr_stxvrdx.
---
 gcc/config/rs6000/rs6000-builtins.def |   4 +-
 .../builtin_altivec_tr_stxvr_runnable.c   | 107 ++
 2 files changed, 109 insertions(+), 2 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/powerpc/builtin_altivec_tr_stxvr_runnable.c

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 638d0bc72ca..d7839f2e06b 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -3161,10 +3161,10 @@
   void __builtin_altivec_tr_stxvrbx (vsq, signed long, signed char *);
 TR_STXVRBX vsx_stxvrbx {stvec}
 
-  void __builtin_altivec_tr_stxvrhx (vsq, signed long, signed int *);
+  void __builtin_altivec_tr_stxvrhx (vsq, signed long, signed short *);
 TR_STXVRHX vsx_stxvrhx {stvec}
 
-  void __builtin_altivec_tr_stxvrwx (vsq, signed long, signed short *);
+  void __builtin_altivec_tr_stxvrwx (vsq, signed long, signed int *);
 TR_STXVRWX vsx_stxvrwx {stvec}
 
   void __builtin_altivec_tr_stxvrdx (vsq, signed long, signed long long *);
diff --git 
a/gcc/testsuite/gcc.target/powerpc/builtin_altivec_tr_stxvr_runnable.c 
b/gcc/testsuite/gcc.target/powerpc/builtin_altivec_tr_stxvr_runnable.c
new file mode 100644
index 000..46014d83535
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/builtin_altivec_tr_stxvr_runnable.c
@@ -0,0 +1,107 @@
+/* Test of __builtin_vec_xst_trunc  */
+
+/* { dg-do run { target power10_hw } } */
+/* { dg-require-effective-target int128 } */
+/* { dg-options "-mdejagnu-cpu=power10 -save-temps" } */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define DEBUG 0
+
+vector signed __int128 store_data =
+  {  (__int128) 0x8ACE << 64 | (__int128) 0xfedcba9876543217ULL};
+
+union conv_t {
+  vector signed __int128 vsi128;
+  unsigned long long ull[2];
+} conv;
+
+void abort (void);
+
+
+int
+main () {
+  int i;
+  signed long sl;
+  signed char sc, expected_sc;
+  signed short ss, expected_ss;
+  signed int si, expected_si;
+  signed long long int sll, expected_sll;
+  signed char *psc;
+  signed short *pss;
+  signed int *psi;
+  signed long long int *psll;
+  
+#if DEBUG
+  val.vsi128 = store_data;
+   printf("Data to store [%d] = 0x%llx %llx\n", i, val.ull[1], val.ull[0]);
+#endif
+
+  psc = 
+  pss = 
+  psi = 
+  psll = 
+
+  sl = 1;
+  sc =0xA1;
+  expected_sc = 0xA1;
+  __builtin_altivec_tr_stxvrbx (store_data, sl, psc);
+
+  if (expected_sc != sc & 0xFF)
+#if DEBUG
+printf(" ERROR: Signed char = 0x%x doesn't match expected value 0x%x\n",
+  sc & 0xFF, expected_sc);
+#else
+abort();
+#endif
+
+  sl = 1;
+  ss = 0x52;
+  expected_ss = 0x1752;
+  __builtin_altivec_tr_stxvrhx (store_data, sl, pss);
+
+  if (expected_ss != ss & 0x)
+#if DEBUG
+printf(" ERROR: Signed short = 0x%x doesn't match expected value 0x%x\n",
+  ss, expected_ss) & 0x;
+#else
+abort();
+#endif
+
+  sl = 1;
+  si = 0x21;
+  expected_si = 0x54321721;
+   __builtin_altivec_tr_stxvrwx (store_data, sl, psi);
+
+   if (expected_si != si)
+#if DEBUG
+printf(" ERROR: Signed int = 0x%x doesn't match expected value 0x%x\n",
+  si, expected_si);
+#else
+abort();
+#endif
+
+  sl = 1;
+  sll = 0x12FFULL;
+   expected_sll = 0xdcba9876543217FF;
+   __builtin_altivec_tr_stxvrdx (store_data, sl, psll);
+
+   if (expected_sll != sll)
+#if DEBUG
+printf(" ERROR: Signed long long int = 0x%llx doesn't match expected value 
0x%llx\n",
+  sll, expected_sll);
+#else
+abort();

Re: [PATCH] rs6000: Fix __builtin_vec_xst_trunc definition

2023-06-01 Thread Carl Love via Gcc-patches

On Wed, 2023-05-31 at 12:59 -0500, Peter Bergner wrote:
> On 5/22/23 4:04 AM, Kewen.Lin wrote:
> > on 2023/5/11 02:06, Carl Love via Gcc-patches wrote:
> > > @@ -3161,12 +3161,15 @@
> > >void __builtin_altivec_tr_stxvrbx (vsq, signed long, signed
> > > char *);
> > >  TR_STXVRBX vsx_stxvrbx {stvec}
> > >  
> > > -  void __builtin_altivec_tr_stxvrhx (vsq, signed long, signed
> > > int *);
> > > +  void __builtin_altivec_tr_stxvrhx (vsq, signed long, signed
> > > short *);
> > >  TR_STXVRHX vsx_stxvrhx {stvec}
> > >  
> > > -  void __builtin_altivec_tr_stxvrwx (vsq, signed long, signed
> > > short *);
> > > +  void __builtin_altivec_tr_stxvrwx (vsq, signed long, signed
> > > int *);
> > >  TR_STXVRWX vsx_stxvrwx {stvec}
> > 
> > Good catching!
> 
> This hunk should be its own patch and commit, as it is independent of
> the other change.  Especially since other built-ins also don't have
> {,un}simgned long * as arguments, not just
> __builtin_altivec_tr_stxvr*x.

Yes, I was thinking the patch needs to be split into a bug fix and a
patch for the long * arguments.

I redid the patch to create the bug fix only.  The patch includes a
testcase that tests the __builtin_altivec_tr_stxvr* builtins.  I will
post the new patch.

The updated patch is now called:  " rs6000: Fix arguments for
__builtin_altivec_tr_stxvrwx, __builtin_altivec_tr_stxvrhx"

> 
> 
> 
> > > +  void __builtin_altivec_tr_stxvrlx (vsq, signed long, signed
> > > long *);
> > > +TR_STXVRLX vsx_stxvrdx {stvec}
> > > +
> > 
> > This is mapped to the one used for type long long, it's a hard
> > mapping,
> > IMHO it's wrong and not consistent with what the users expect,
> > since on Power
> > the size of type long int is 4 bytes at -m32 while 8 bytes at -m64,
> > this
> > implementation binding to 8 bytes can cause trouble in 32-bit.  I
> > wonder if
> > it's a good idea to add one overloaded version for type long int,
> > for now
> > openxl also emits error message for long int type pointer (see its
> > doc [1]),
> > users can use casting to make it to the acceptable pointer types
> > (long long
> > or int as its size).
> 
> I'm the person who noticed that we don't accept signed/unsigned long
> * as
> an argument type and asked Carl to investigate.  I find it hard to
> believe
> we accept all integer pointer types, except long *.  I agree that it
> shouldn't
> always map to long long *, since as you say, that's wrong for -m32.
> My hope was that we could somehow automagically handle the long *
> types
> in the built-in machinery, mapping them to either the int * built-in
> or
> the long long * built-in depending on -m32 or -m64.  Again, this
> limitation
> is no limited to __builtin_altivec_tr_stx* built-ins, but others as
> well,
> so I was kind of hoping for a general solution that would fix them
> all.
> I'm not sure of that's possible though.

Per Peter's request, I added the overloaded version of the
__builtin_vec_xst_trunc builtin with the long * argument which Kewen
pushed back on.  So, that approach is not acceptable.  Not sure about
how to get the builtin infrastructure to automatically map long * to
int * or long long *?  If someone has some idea on how to do that, I
will gladly pursue it.  I will study the builtin support some more to
see if I can come up with any ideas as well.

 Carl

Re: [PATCH v2] rs6000: Add buildin for mffscrn instructions

2023-05-31 Thread Carl Love via Gcc-patches

Kewen:

On Wed, 2023-05-31 at 17:11 +0800, Kewen.Lin wrote:
> > So, there is no need for the builtin to have to determine if the
> > user
> > is storing the result of the __builtin_set_fpscr_rn.  The RN bits
> > will
> > always be updated by the __builtin_set_fpscr_rn builtin and the
> > existing fields of the FPSCR will always be returned by the
> > builtin.
> 
> Yeah, I agree, even with pre-P9 code when the returned value is
> unused,
> I'd expect DCE can eliminate the part for the FPSCR bits reading and
> masking, it's just like before (only setting RN bits).
> 
> The only concern I mentioned before is the built-in name doesn't
> clearly
> match what it does (with extending, it returns something instead)
> since
> it's only saying "set" and setting RN bits, the return value is
> easily
> misunderstood as returning old RN bits, the documentation has to
> explain
> and note it well.
> 
> Looking forward to Segher's opinion on this.

I have the patch to extend the __builtin_set_fpscr_rn builtin working. 
I agree the documentation on the instructions in the ISA is not really
clear about that.  It needs to be much more explicit in the builtin
description that the current RN field is returned then the field is
updated with the new RN bits from the argument.  

I sent the patch, with the updated builtin description and testcases to
the GLibC team to see what they thought of it.  The goal was for the
builtin to be effectively a "drop in replacement" for the inline asm
that they have.  I was planning on posting the new version if the GLibC
team says it works for them.  Hopefully I will hear from them soon.

Carl

[PATCH] rs6000: Update the vsx-vector-6.* tests.

2023-05-30 Thread Carl Love via Gcc-patches

GCC maintainers:

The following patch takes the tests in vsx-vector-6-p7.h,  vsx-vector-
6-p8.h, vsx-vector-6-p9.h and reorganizes them into a series of smaller
test files by functionality rather than processor version.

The patch has been tested on Power 10 with no regressions.

Please let me know if this patch is acceptable for mainline.  Thanks.

   Carl

--
rs6000: Update the vsx-vector-6.* tests.

The vsx-vector-6.h file is included into the processor specific test files
vsx-vector-6.p7.c, vsx-vector-6.p8.c, and vsx-vector-6.p9.c.  The .h file
contains a large number of vsx vector builtin tests.  The processor
specific files contain the number of instructions that the tests are
expected to generate for that processor.  The tests are compile only.

The tests are broken up into a seriers of files for related tests.  The
new tests are runnable tests to verify the builtin argument types and the
functional correctness of each test rather then verifying the type and
number of instructions generated.

gcc/testsuite/
* gcc.target/powerpc/vsx-vector-6-1op.c: New test file.
* gcc.target/powerpc/vsx-vector-6-2lop.c: New test file.
* gcc.target/powerpc/vsx-vector-6-2op.c: New test file.
* gcc.target/powerpc/vsx-vector-6-3op.c: New test file.
* gcc.target/powerpc/vsx-vector-6-cmp-all.c: New test file.
* gcc.target/powerpc/vsx-vector-6-cmp.c: New test file.
* gcc.target/powerpc/vsx-vector-6.h: Remove test file.
* gcc.target/powerpc/vsx-vector-6-p7.h: Remove test file.
* gcc.target/powerpc/vsx-vector-6-p8.h: Remove test file.
* gcc.target/powerpc/vsx-vector-6-p9.h: Remove test file.
---
 .../powerpc/vsx-vector-6-func-1op.c   | 319 +
 .../powerpc/vsx-vector-6-func-2lop.c  | 305 +
 .../powerpc/vsx-vector-6-func-2op.c   | 278 
 .../powerpc/vsx-vector-6-func-3op.c   | 229 ++
 .../powerpc/vsx-vector-6-func-cmp-all.c   | 429 ++
 .../powerpc/vsx-vector-6-func-cmp.c   | 237 ++
 .../gcc.target/powerpc/vsx-vector-6.h | 154 ---
 .../gcc.target/powerpc/vsx-vector-6.p7.c  |  43 --
 .../gcc.target/powerpc/vsx-vector-6.p8.c  |  43 --
 .../gcc.target/powerpc/vsx-vector-6.p9.c  |  42 --
 10 files changed, 1797 insertions(+), 282 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-1op.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-2lop.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-2op.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-3op.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-cmp-all.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-cmp.c
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.h
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.p7.c
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.p8.c
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-vector-6.p9.c

diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-1op.c 
b/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-1op.c
new file mode 100644
index 000..90a360ea158
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vsx-vector-6-func-1op.c
@@ -0,0 +1,319 @@
+/* { dg-do compile { target lp64 } } */
+/* { dg-skip-if "" { powerpc*-*-darwin* } } */
+/* { dg-options "-O2 -mdejagnu-cpu=power7" } */
+
+/* Functional test of the one operand vector builtins.  */
+
+#include 
+#include 
+#include 
+
+#define DEBUG 0
+
+void abort (void);
+
+int
+main () {
+  int i;
+  vector float f_src = { 125.44, 23.04, -338.56, 17.64};
+  vector float f_result;
+  vector float f_abs_expected = { 125.44, 23.04, 338.56, 17.64};
+  vector float f_ceil_expected = { 126.0, 24.0, -338, 18.0};
+  vector float f_floor_expected = { 125.0, 23.0, -339, 17.0};
+  vector float f_nearbyint_expected = { 125.0, 23.0, -339, 18.0};
+  vector float f_rint_expected = { 125.0, 23.0, -339, 18.0};
+  vector float f_sqrt_expected = { 11.2, 4.8, 18.4, 4.2};
+  vector float f_trunc_expected = { 125.0, 23.0, -338, 17};
+
+  vector double d_src = { 125.44, -338.56};
+  vector double d_result;
+  vector double d_abs_expected = { 125.44, 338.56};
+  vector double d_ceil_expected = { 126.0, -338.0};
+  vector double d_floor_expected = { 125.0, -339.0};
+  vector double d_nearbyint_expected = { 125.0, -339.0};
+  vector double d_rint_expected = { 125.0, -339.0};
+  vector double d_sqrt_expected = { 11.2, 18.4};
+  vector double d_trunc_expected = { 125.0, -338.0};
+
+  /* Abs, float */
+  f_result = vec_abs (f_src);
+
+  if ((f_result[0] != f_abs_expected[0])
+  || (f_result[1] != f_abs_expected[1])
+  || (f_result[2] != f_abs_expected[2])
+  || (f_result[3] != f_abs_expected[3]))
+#if DEBUG
+{
+

[PATCH] rs6000, fix vec_replace_unaligned builtin arguments

2023-05-30 Thread Carl Love via Gcc-patches

GCC maintainers:

The following patch fixes the first argument in the builtin definition
and the corresponding test cases.  Initially, the builtin specification
was wrong due to a cut and past error.  The documentation was fixed in:


   commit 8cb748a31cd8c7ac9c88b6abc38ce077dd462a7a
   Author: Bill Schmidt 
   Date:   Fri Feb 4 13:26:44 2022 -0600

   rs6000: Clean up ISA 3.1 documentation [PR100808]

   Due to a pasto error in the documentation, vec_replace_unaligned was
   implemented with the same function prototypes as vec_replace_elt.  It was
   intended that vec_replace_unaligned always specify output vectors as 
having
   type vector unsigned char, to emphasize that elements are potentially
   misaligned by this built-in function.  This patch corrects the
   misimplementation.

   2022-02-04  Bill Schmidt  

   gcc/
   PR target/100808
   * doc/extend.texi (Basic PowerPC Built-in Functions Available on 
ISA
   3.1): Provide consistent type names.  Remove unnecessary 
semicolons.
   Fix bad line breaks.

This patch fixes the arguments in the definitions and updates the
testcases accordingly.  Additionally, a few minor spacing issues are
fixed.

The patch has been tested on Power 10 with no regressions.  Please let
me know if the patch is acceptable for mainline.  Thanks.

 Carl 

--
rs6000, fix vec_replace_unaligned builtin arguments

The first argument of the vec_replace_unaligned builtin should always be
unsinged char, as specified in gcc/doc/extend.texi.

This patch fixes the buitin definitions and updates the testcases to use
the correct arguments.

gcc/ChangeLog:
* config/rs6000/rs6000-overload.def (__builtin_vec_replace_un):
Fix first argument type.

gcc/testsuite/ChangeLog:
* gcc.target/powerpc/ver-replace-word-runnable.c
(vec_replace_unaligned) Fix first argument type.
(vresult_uchar): Fix expected   results.
(vec_replace_unaligned): Update for loop to check uchar results.
Remove extra spaces in if statements.
Insert missing spaces in for statements.
(dg-final): Update expected instruction counts.
---
 gcc/config/rs6000/rs6000-overload.def |  12 +-
 .../powerpc/vec-replace-word-runnable.c   | 157 ++
 2 files changed, 92 insertions(+), 77 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def
index c582490c084..26dc662b8fb 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -3059,17 +3059,17 @@
 VREPLACE_ELT_V2DF
 
 [VEC_REPLACE_UN, vec_replace_unaligned, __builtin_vec_replace_un]
-  vuc __builtin_vec_replace_un (vui, unsigned int, const int);
+  vuc __builtin_vec_replace_un (vuc, unsigned int, const int);
 VREPLACE_UN_UV4SI
-  vuc __builtin_vec_replace_un (vsi, signed int, const int);
+  vuc __builtin_vec_replace_un (vuc, signed int, const int);
 VREPLACE_UN_V4SI
-  vuc __builtin_vec_replace_un (vull, unsigned long long, const int);
+  vuc __builtin_vec_replace_un (vuc, unsigned long long, const int);
 VREPLACE_UN_UV2DI
-  vuc __builtin_vec_replace_un (vsll, signed long long, const int);
+  vuc __builtin_vec_replace_un (vuc, signed long long, const int);
 VREPLACE_UN_V2DI
-  vuc __builtin_vec_replace_un (vf, float, const int);
+  vuc __builtin_vec_replace_un (vuc, float, const int);
 VREPLACE_UN_V4SF
-  vuc __builtin_vec_replace_un (vd, double, const int);
+  vuc __builtin_vec_replace_un (vuc, double, const int);
 VREPLACE_UN_V2DF
 
 [VEC_REVB, vec_revb, __builtin_vec_revb]
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-replace-word-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/vec-replace-word-runnable.c
index 27318822871..66b0ef58996 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-replace-word-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-replace-word-runnable.c
@@ -20,6 +20,9 @@ main (int argc, char *argv [])
   unsigned char ch;
   unsigned int index;
 
+  vector unsigned char src_va_uchar;
+  vector unsigned char expected_vresult_uchar;
+
   vector unsigned int vresult_uint;
   vector unsigned int expected_vresult_uint;
   vector unsigned int src_va_uint;
@@ -64,10 +67,10 @@ main (int argc, char *argv [])
 
   vresult_uint = vec_replace_elt (src_va_uint, src_a_uint, 2);
 
-  if (!vec_all_eq (vresult_uint,  expected_vresult_uint)) {
+  if (!vec_all_eq (vresult_uint, expected_vresult_uint)) {
 #if DEBUG
 printf("ERROR, vec_replace_elt (src_vb_uint, src_va_uint, index)\n");
-for(i = 0; i < 4; i++)
+for (i = 0; i < 4; i++)
   printf(" vresult_uint[%d] = %d, expected_vresult_uint[%d] = %d\n",
 i, vresult_uint[i], i, expected_vresult_uint[i]);
 #else
@@ -82,10 +85,10 @@ main (int argc, char *argv [])

Re: [PATCH v2] rs6000: Add buildin for mffscrn instructions

2023-05-25 Thread Carl Love via Gcc-patches

Peter, Kewen:

On Thu, 2023-05-25 at 13:28 +0800, Kewen.Lin wrote:
> on 2023/5/24 23:20, Carl Love wrote:
> > On Wed, 2023-05-24 at 13:32 +0800, Kewen.Lin wrote:
> > > on 2023/5/24 06:30, Peter Bergner wrote:
> > > > On 5/23/23 12:24 AM, Kewen.Lin wrote:
> > > > > on 2023/5/23 01:31, Carl Love wrote:
> > > > > > The builtins were requested for use in GLibC.  As of
> > > > > > version
> > > > > > 2.31 they
> > > > > > were added as inline asm.  They requested a builtin so the
> > > > > > asm
> > > > > > could be
> > > > > > removed.
> > > > > 
> > > > > So IMHO we also want the similar support for mffscrn, that is
> > > > > to
> > > > > make
> > > > > use of mffscrn and mffscrni on Power9 and later, but falls
> > > > > back
> > > > > to 
> > > > > __builtin_set_fpscr_rn + mffs similar on older platforms.
> > > > 
> > > > So __builtin_set_fpscr_rn everything we want (sets the RN bits)
> > > > and
> > > > uses mffscrn/mffscrni on P9 and later and uses older insns on
> > > > pre-
> > > > P9.
> > > > The only problem is we don't return the current FPSCR bits, as
> > > > the
> > > > bif
> > > > is defined to return void.
> > > 
> > > Yes.
> > > 
> > > > Crazy idea, but could we extend the built-in
> > > > with an overload that returns the FPSCR bits?  
> > > 
> > > So you agree that we should make this proposed new bif handle
> > > pre-P9
> > > just
> > > like some other existing bifs. :)  I think extending it is good
> > > and
> > > doable,
> > > but the only concern here is the bif name
> > > "__builtin_set_fpscr_rn",
> > > which
> > > matches the existing behavior (only set rounding) but doesn't
> > > match
> > > the
> > > proposed extending behavior (set rounding and get some env bits
> > > back).
> > > Maybe it's not a big deal if the documentation clarify it well.
> > 
> > Extending the builtin to pre Power 9 is straight forward and I
> > agree
> > would make good sense to do.
> > 
> > I am a bit concerned on how to extend __builtin_set_fpscr_rn to add
> > the
> > new functionality.  Peter suggests overloading the builtin to
> > either
> > return void or returns FPSCR bits.  It is my understanding that the
> > return value for a given builtin had to be the same, i.e. you can't
> > overload the return value. Maybe you can with Bill's new
> > infrastructure?  I recall having problems trying to overload the
> > return
> > value in the past and Bill said you couldn't do it.  I play with
> > this
> > and see if I can overload the return value.
> 
> Your understanding on that we fail to overload this for just
> different
> return types is correct.  But previously I interpreted the extending
> proposal as to extend
> 
>   void __builtin_set_fpscr_rn (int);
> 
> to 
> 
>   void __builtin_set_fpscr_rn (int, double*);
> 
> The related address taken and store here can be optimized out
> normally.

I don't think that is correct.   The current definition of the builtin
is:

 void __builtin_set_fpscr_rn (int);

The proposal by Peter was to change the return type to double, i.e.

 double __builtin_set_fpscr_rn (int);

Peter also said the following:

   The built-in machinery can see that the usage is expecting a return
   value or not and for the pre-P9 code, can skip generating the ending
   mffs if we don't want the return value.

Which I don't think we want.  The mffscrn and mffscrni instructions
return the contents of the control bits in the FPSCR, that is, bits
29:31 (DRN) and bits 56:63 (VE, OE, UE, ZE, XE, NI, RN), are placed
into the corresponding bits in register FRT. All other bits in register
FRT are set to 0.  

The instructions also updates the current RN field of the FPSCR with
the new RN supplied the second argument of the instruction.  So, the
instructions update the RN field just like the __builtin_set_fpscr_rn. 
So, we can use the existing __builtin_set_fpscr_rn to update the RN for
all ISAs, we just need to have __builtin_set_fpscr_rn always return a
double with the desired fields from the FPSCR (the current RN).  This
will then emulate the behavior of the mffscrn and mffscrni
instructions.  The current uses of __builtin_set_fpscr_rn will just
ignore the return value which is not a problem.  The return value can
be stored in the places were the user is currently using the inline asm
for the mffscrn and mffscrni instructions.

The __builtin_set_fpscr_rn builtin is currently using the mffscrn and
mffscrni on Power 9 and throwing away the result from the instruction. 
We just need to change __builtin_set_fpscr_rn to return the value
instead.  For the pre Power 9 code, the builtin will need to read the
full FPSCR, mask of the desired fields and return the fields.

So, there is no need for the builtin to have to determine if the user
is storing the result of the __builtin_set_fpscr_rn.  The RN bits will
always be updated by the __builtin_set_fpscr_rn builtin and the
existing fields of the FPSCR will always be returned by the builtin.

Please let me know if you agree.  I think I have this sorted out

Re: [PATCH v2] rs6000: Add buildin for mffscrn instructions

2023-05-24 Thread Carl Love via Gcc-patches

On Wed, 2023-05-24 at 13:32 +0800, Kewen.Lin wrote:
> on 2023/5/24 06:30, Peter Bergner wrote:
> > On 5/23/23 12:24 AM, Kewen.Lin wrote:
> > > on 2023/5/23 01:31, Carl Love wrote:
> > > > The builtins were requested for use in GLibC.  As of version
> > > > 2.31 they
> > > > were added as inline asm.  They requested a builtin so the asm
> > > > could be
> > > > removed.
> > > 
> > > So IMHO we also want the similar support for mffscrn, that is to
> > > make
> > > use of mffscrn and mffscrni on Power9 and later, but falls back
> > > to 
> > > __builtin_set_fpscr_rn + mffs similar on older platforms.
> > 
> > So __builtin_set_fpscr_rn everything we want (sets the RN bits) and
> > uses mffscrn/mffscrni on P9 and later and uses older insns on pre-
> > P9.
> > The only problem is we don't return the current FPSCR bits, as the
> > bif
> > is defined to return void.
> 
> Yes.
> 
> > Crazy idea, but could we extend the built-in
> > with an overload that returns the FPSCR bits?  
> 
> So you agree that we should make this proposed new bif handle pre-P9
> just
> like some other existing bifs. :)  I think extending it is good and
> doable,
> but the only concern here is the bif name "__builtin_set_fpscr_rn",
> which
> matches the existing behavior (only set rounding) but doesn't match
> the
> proposed extending behavior (set rounding and get some env bits
> back).
> Maybe it's not a big deal if the documentation clarify it well.

Extending the builtin to pre Power 9 is straight forward and I agree
would make good sense to do.

I am a bit concerned on how to extend __builtin_set_fpscr_rn to add the
new functionality.  Peter suggests overloading the builtin to either
return void or returns FPSCR bits.  It is my understanding that the
return value for a given builtin had to be the same, i.e. you can't
overload the return value. Maybe you can with Bill's new
infrastructure?  I recall having problems trying to overload the return
value in the past and Bill said you couldn't do it.  I play with this
and see if I can overload the return value.
> 
> 
> > To be honest, I like
> > the __builtin_set_fpscr_rn name better than __builtin_mffscrn[i].
> 
> +1
> 
> BR,
> Kewen
> 
> > The built-in machinery can see that the usage is expecting a return
> > value
> > or not and for the pre-P9 code, can skip generating the ending mffs
> > if
> > we don't want the return value.
> > 
> > Peter
> > 
> >

[PATCH ver 2] rs6000: Fix __builtin_vec_xst_trunc definition

2023-05-22 Thread Carl Love via Gcc-patches

Kewen, GCC maintainers:

Version 2, addressed comments from Kewen. Added an additional
overloaded builtin:  
   void __builtin_vec_xst_trunc (vuq, signed long long, long *);


The following patch fixes errors in the arguments in the
__builtin_altivec_tr_stxvrhx,   __builtin_altivec_tr_stxvrwx builtin
definitions.  Note, these builtins are used by the overloaded
__builtin_vec_xst_trunc builtin.

The patch adds a new overloaded builtin definition for
__builtin_vec_xst_trunc for the third argument to be unsigned and
signed long int.

A new testcase is added for the various overloaded versions of
__builtin_vec_xst_trunc.

The patch has been tested on Power 10 with no new regressions.

Please let me know if the patch is acceptable for mainline.  Thanks.

Carl

-
rs6000: Fix __builtin_vec_xst_trunc definition

Built-in __builtin_vec_xst_trunc calls __builtin_altivec_tr_stxvrhx
and __builtin_altivec_tr_stxvrwx to handle the short and word cases.  The
arguments for these two builtins are wrong.  This patch fixes the wrong
arguments for the builtins.

Additionally, the patch adds a new __builtin_vec_xst_trunc overloaded
version for the destination being signed or unsigned long int.

A runnable test case is added to test each of the overloaded definitions
of __builtin_vec_xst_tru

gcc/
* config/rs6000/builtins.def (__builtin_altivec_tr_stxvrhx,
__builtin_altivec_tr_stxvrwx): Fix type of second argument.
Add, definition for send argument to be signed long.
* config/rs6000/rs6000-overload.def (__builtin_vec_xst_trunc):
add definition with thrird arument signed and unsigned long.
* doc/extend.texi (__builtin_vec_xst_trunc): Add documentation for
new unsinged long and signed long versions.

gcc/testsuite/
* gcc.target/powerpc/vsx-builtin-vec_xst_trunc.c: New test case
for __builtin_vec_xst_trunc builtin.
---
 gcc/config/rs6000/rs6000-builtins.def |   7 +-
 gcc/config/rs6000/rs6000-overload.def |   6 +
 gcc/doc/extend.texi   |   2 +
 .../powerpc/vsx-builtin-vec_xst_trunc.c   | 241 ++
 4 files changed, 254 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-builtin-vec_xst_trunc.c

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 638d0bc72ca..a378491b358 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -3161,12 +3161,15 @@
   void __builtin_altivec_tr_stxvrbx (vsq, signed long, signed char *);
 TR_STXVRBX vsx_stxvrbx {stvec}
 
-  void __builtin_altivec_tr_stxvrhx (vsq, signed long, signed int *);
+  void __builtin_altivec_tr_stxvrhx (vsq, signed long, signed short *);
 TR_STXVRHX vsx_stxvrhx {stvec}
 
-  void __builtin_altivec_tr_stxvrwx (vsq, signed long, signed short *);
+  void __builtin_altivec_tr_stxvrwx (vsq, signed long, signed int *);
 TR_STXVRWX vsx_stxvrwx {stvec}
 
+  void __builtin_altivec_tr_stxvrlx (vsq, signed long, signed long *);
+TR_STXVRLX vsx_stxvrdx {stvec}
+
   void __builtin_altivec_tr_stxvrdx (vsq, signed long, signed long long *);
 TR_STXVRDX vsx_stxvrdx {stvec}
 
diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def
index c582490c084..fd47f5b24e8 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -4872,6 +4872,12 @@
 TR_STXVRWX  TR_STXVRWX_S
   void __builtin_vec_xst_trunc (vuq, signed long long, unsigned int *);
 TR_STXVRWX  TR_STXVRWX_U
+  void __builtin_vec_xst_trunc (vsq, signed long long, signed long *);
+TR_STXVRLX  TR_STXVRLX_S
+  void __builtin_vec_xst_trunc (vuq, signed long long, unsigned long *);
+TR_STXVRLX  TR_STXVRLX_U
+  void __builtin_vec_xst_trunc (vuq, signed long long, long *);
+TR_STXVRLX  TR_STXVRLX_I
   void __builtin_vec_xst_trunc (vsq, signed long long, signed long long *);
 TR_STXVRDX  TR_STXVRDX_S
   void __builtin_vec_xst_trunc (vuq, signed long long, unsigned long long *);
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index e426a2eb7d8..7e2ae790ab3 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -18570,10 +18570,12 @@ instructions.
 @defbuiltin{{void} vec_xst_trunc (vector signed __int128, signed long long, 
signed char *)}
 @defbuiltinx{{void} vec_xst_trunc (vector signed __int128, signed long long, 
signed short *)}
 @defbuiltinx{{void} vec_xst_trunc (vector signed __int128, signed long long, 
signed int *)}
+@defbuiltinx{{void} vec_xst_trunc (vector signed __int128, signed long long, 
signed long *)}
 @defbuiltinx{{void} vec_xst_trunc (vector signed __int128, signed long long, 
signed long long *)}
 @defbuiltinx{{void} vec_xst_trunc (vector unsigned __int128, signed long long, 
unsigned char *)}
 @defbuiltinx{{void} vec_xst_trunc (vector unsigned __int128, signed long long,

Re: [PATCH] rs6000: Fix __builtin_vec_xst_trunc definition

2023-05-22 Thread Carl Love via Gcc-patches

On Mon, 2023-05-22 at 17:04 +0800, Kewen.Lin wrote:
> Hi Carl,
> 
> on 2023/5/11 02:06, Carl Love via Gcc-patches wrote:
> > GCC maintainers:
> > 
> > The following patch fixes errors in the arguments in the
> > __builtin_altivec_tr_stxvrhx,   __builtin_altivec_tr_stxvrwx
> > builtin
> > definitions.  Note, these builtins are used by the overloaded
> > __builtin_vec_xst_trunc builtin.
> > 
> > The patch adds a new overloaded builtin definition for
> > __builtin_vec_xst_trunc for the third argument to be unsigned and
> > signed long int.
> > 
> > A new testcase is added for the various overloaded versions of
> > __builtin_vec_xst_trunc.
> > 
> > The patch has been tested on Power 10 with no new regressions.
> > 
> > Please let me know if the patch is acceptable for
> > mainline.  Thanks.
> > 
> > Carl
> > 
> > ---
> > rs6000: Fix __builtin_vec_xst_trunc definition
> > 
> > Built-in __builtin_vec_xst_trunc calls __builtin_altivec_tr_stxvrhx
> > and __builtin_altivec_tr_stxvrwx to handle the short and word
> > cases.  The
> > arguments for these two builtins are wrong.  This patch fixes the
> > wrong
> > arguments for the builtins.
> > 
> > Additionally, the patch adds a new __builtin_vec_xst_trunc
> > overloaded
> > version for the destination being signed or unsigned long int.
> > 
> > A runnable test case is added to test each of the overloaded
> > definitions
> > of __builtin_vec_xst_tru
> > 
> > gcc/
> > * config/rs6000/builtins.def (__builtin_altivec_tr_stxvrhx,
> > __builtin_altivec_tr_stxvrwx): Fix type of second argument.
> > Add, definition for send argument to be signed long.
> > * config/rs6000/rs6000-overload.def (__builtin_vec_xst_trunc):
> > add definition with thrird arument signed and unsigned long.
> > * doc/extend.texi (__builtin_vec_xst_trunc): Add documentation
> > for
> > new unsinged long and signed long versions.
> > 
> > gcc/testsuite/
> > * gcc.target/powerpc/vsx-builtin-vec_xst_trunc.c: New test case
> > for __builtin_vec_xst_trunc builtin.
> > ---
> >  gcc/config/rs6000/rs6000-builtins.def |   7 +-
> >  gcc/config/rs6000/rs6000-overload.def |   4 +
> >  gcc/doc/extend.texi   |   2 +
> >  .../powerpc/vsx-builtin-vec_xst_trunc.c   | 217
> > ++
> >  4 files changed, 228 insertions(+), 2 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-builtin-
> > vec_xst_trunc.c
> > 
> > diff --git a/gcc/config/rs6000/rs6000-builtins.def
> > b/gcc/config/rs6000/rs6000-builtins.def
> > index 638d0bc72ca..a378491b358 100644
> > --- a/gcc/config/rs6000/rs6000-builtins.def
> > +++ b/gcc/config/rs6000/rs6000-builtins.def
> > @@ -3161,12 +3161,15 @@
> >void __builtin_altivec_tr_stxvrbx (vsq, signed long, signed char
> > *);
> >  TR_STXVRBX vsx_stxvrbx {stvec}
> >  
> > -  void __builtin_altivec_tr_stxvrhx (vsq, signed long, signed int
> > *);
> > +  void __builtin_altivec_tr_stxvrhx (vsq, signed long, signed
> > short *);
> >  TR_STXVRHX vsx_stxvrhx {stvec}
> >  
> > -  void __builtin_altivec_tr_stxvrwx (vsq, signed long, signed
> > short *);
> > +  void __builtin_altivec_tr_stxvrwx (vsq, signed long, signed int
> > *);
> >  TR_STXVRWX vsx_stxvrwx {stvec}
> 
> Good catching!
> 
> >  
> > +  void __builtin_altivec_tr_stxvrlx (vsq, signed long, signed long
> > *);
> > +TR_STXVRLX vsx_stxvrdx {stvec}
> > +
> 
> This is mapped to the one used for type long long, it's a hard
> mapping,
> IMHO it's wrong and not consistent with what the users expect, since
> on Power
> the size of type long int is 4 bytes at -m32 while 8 bytes at -m64,
> this
> implementation binding to 8 bytes can cause trouble in 32-bit.  I
> wonder if
> it's a good idea to add one overloaded version for type long int, for
> now
> openxl also emits error message for long int type pointer (see its
> doc [1]),
> users can use casting to make it to the acceptable pointer types
> (long long
> or int as its size).
> 
> [1] 
> https://www.ibm.com/docs/en/openxl-c-and-cpp-lop/17.1.1?topic=functions-vec-xst-trunc
> 
> 

If I understand this correctly, the "signed long" is mapped to type
"long long int"?  Just curious, where is the mapping done?

So I believe you would like to have an additional overloaded
definition:

[PATCH v3] rs6000: Add buildin for mffscrn instructions

2023-05-22 Thread Carl Love via Gcc-patches

Kewen, Segher, GCC maintainers:

Version 3, fixed various issues noted by Kewen.  Retested on Power 10. 
No regression issues.

Version 2,  Fixed an issue with the test case.  The dg-options line was
missing.

The following patch adds an overloaded builtin.  There are two possible
arguments for the builtin.  The builtin definitions are:

  double __builtin_mffscrn (unsigned long int);
  double __builtin_mffscrn (double);

The patch has been tested on Power 10 with no regressions.  

Please let me know if the patch is acceptable for mainline.  Thanks.

Carl 
---

rs6000: Add builtin for mffscrn instructions

This patch adds overloaded __builtin_mffscrn for the move From FPSCR
Control & Set RN instruction with an immediate argument.  It also adds the
builtin with a floating point register argument.  A new runnable test is
added for the new builtin.

gcc/

* config/rs6000/rs6000-builtins.def (__builtin_mffscrni,
__builtin_mffscrnd): Add builtin definitions.
* config/rs6000/rs6000-overload.def (__builtin_mffscrn): Add
overloaded definition.
* doc/extend.texi: Add documentation for __builtin_mffscrn.

gcc/testsuite/

* gcc.target/powerpc/builtin-mffscrn.c: Add testcase for new
builtin.

---
 gcc/config/rs6000/rs6000-builtins.def |   9 +-
 gcc/config/rs6000/rs6000-overload.def |   5 +
 gcc/doc/extend.texi   |  10 ++
 .../gcc.target/powerpc/builtin-mffscrn.c  | 106 ++
 4 files changed, 129 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/builtin-mffscrn.c

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 92d9b46e1b9..ae08d2fbff7 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -2849,6 +2849,14 @@
   const signed int  __builtin_vsx_scalar_extract_exp (double);
 VSEEDP xsxexpdp_si {}
 
+; Immediate instruction only uses the least significant two bits of the
+; const int.
+  double __builtin_mffscrni (const int<2>);
+MFFSCRNI rs6000_mffscrni {nosoft}
+
+  double __builtin_mffscrnd (double);
+MFFSCRNF rs6000_mffscrn {nosoft}
+
 [power9-64]
   void __builtin_altivec_xst_len_r (vsc, void *, long);
 XST_LEN_R xst_len_r {}
@@ -2875,7 +2883,6 @@
   pure vsc __builtin_vsx_xl_len_r (void *, signed long);
 XL_LEN_R xl_len_r {}
 
-
 ; Builtins requiring hardware support for IEEE-128 floating-point.
 [ieee128-hw]
   fpmath _Float128 __builtin_addf128_round_to_odd (_Float128, _Float128);
diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def
index 26dc662b8fb..39423bcec2b 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -78,6 +78,11 @@
 ; like after a required newline, but nowhere else.  Lines beginning with
 ; a semicolon are also treated as blank lines.
 
+[MFFSCR, __builtin_mffscrn, __builtin_mffscrn]
+  double __builtin_mffscrn (const int<2>);
+MFFSCRNI
+  double __builtin_mffscrn (double);
+MFFSCRNF
 
 [BCDADD, __builtin_bcdadd, __builtin_vec_bcdadd]
   vsq __builtin_vec_bcdadd (vsq, vsq, const int);
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index ed8b9c8a87b..82f9932666a 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -18274,6 +18274,16 @@ The @code{__builtin_recipdiv}, and 
@code{__builtin_recipdivf}
 functions generate multiple instructions to implement division using
 the reciprocal estimate instructions.
 
+double __builtin_mffscrn (const int);
+double __builtin_mffscrn (double);
+
+The @code{__builtin_mffscrn} returns the contents of the control bits DRN, VE,
+OE, UE, ZE, XE, NI, RN in the FPSCR are returned with RN updated appropriately.
+In the case of the const int variant of the builtin, RN is set to the 2-bit
+value specified in the builtin.  In the case of the double builtin variant, the
+2-bit value in the double argument that corresponds to the RN location in the
+FPSCR is updated.
+
 The following functions require @option{-mhard-float} and
 @option{-mmultiple} options.
 
diff --git a/gcc/testsuite/gcc.target/powerpc/builtin-mffscrn.c 
b/gcc/testsuite/gcc.target/powerpc/builtin-mffscrn.c
new file mode 100644
index 000..69a7a17cfc7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/builtin-mffscrn.c
@@ -0,0 +1,106 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p9modulo_hw } */
+/* { dg-options "-mdejagnu-cpu=power9" } */
+
+#include 
+
+#ifdef DEBUG
+#include 
+#endif
+
+#define MASK 0x3
+#define EXPECTED1 0x1
+#define EXPECTED2 0x2
+
+void abort (void);
+
+int
+main()
+{
+  unsigned long mask, result, expected;
+  double double_arg;
+  
+  union convert_t {
+double d;
+unsigned long ul;
+  } val;
+
+  /* Test immediate version of __builtin_mffscrn. */
+  /* Read FPSCR and set RN bits in FPSCR[62:63]. */
+  val.d = __builtin_mffscrn

Re: [PATCH v2] rs6000: Add buildin for mffscrn instructions

2023-05-22 Thread Carl Love via Gcc-patches

On Mon, 2023-05-22 at 14:36 +0800, Kewen.Lin wrote:
> Hi Carl,
> 
> on 2023/5/19 05:12, Carl Love via Gcc-patches wrote:
> > GCC maintainers:
> > 
> > version 2.  Fixed an issue with the test case.  The dg-options line
> > was
> > missing.
> > 
> > The following patch adds an overloaded builtin.  There are two
> > possible
> > arguments for the builtin.  The builtin definitions are:
> > 
> >   double __builtin_mffscrn (unsigned long int);
> >   double __builtin_mffscrn (double);
> > 
> 
> We already have one  bif __builtin_set_fpscr_rn for RN setting,
> apparently
> these two are mainly for direct mapping to mffscr[ni] and want the
> FPSCR bits.
> I'm curious what's the requirements requesting these two built-in
> functions?

The builtins were requested for use in GLibC.  As of version 2.31 they
were added as inline asm.  They requested a builtin so the asm could be
removed.

> 
> > The patch has been tested on Power 10 with no regressions.  
> > 
> > Please let me know if the patch is acceptable for
> > mainline.  Thanks.
> > 
> > Carl
> > 
> > 
> > rs6000: Add buildin for mffscrn instructions
> > 
> 
> s/buildin/built-in/

fixed
> 
> > This patch adds overloaded __builtin_mffscrn for the move From
> > FPSCR
> > Control & Set R instruction with an immediate argument.  It also
> > adds the
> > builtin with a floating point register argument.  A new runnable
> > test is
> > added for the new builtin.
> 
> s/Set R/Set RN/

fixed

> > gcc/
> > 
> > * config/rs6000/rs6000-builtins.def (__builtin_mffscrni,
> > __builtin_mffscrnd): Add builtin definitions.
> > * config/rs6000/rs6000-overload.def (__builtin_mffscrn): Add
> > overloaded definition.
> > * doc/extend.texi: Add documentation for __builtin_mffscrn.
> > 
> > gcc/testsuite/
> > 
> > * gcc.target/powerpc/builtin-mffscrn.c: Add testcase for new
> > builtin.
> > ---
> >  gcc/config/rs6000/rs6000-builtins.def |   7 ++
> >  gcc/config/rs6000/rs6000-overload.def |   5 +
> >  gcc/doc/extend.texi   |   8 ++
> >  .../gcc.target/powerpc/builtin-mffscrn.c  | 106
> > ++
> >  4 files changed, 126 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/builtin-
> > mffscrn.c
> > 
> > diff --git a/gcc/config/rs6000/rs6000-builtins.def
> > b/gcc/config/rs6000/rs6000-builtins.def
> > index 92d9b46e1b9..67125473684 100644
> > --- a/gcc/config/rs6000/rs6000-builtins.def
> > +++ b/gcc/config/rs6000/rs6000-builtins.def
> > @@ -2875,6 +2875,13 @@
> >pure vsc __builtin_vsx_xl_len_r (void *, signed long);
> >  XL_LEN_R xl_len_r {}
> >  
> > +; Immediate instruction only uses the least significant two bits
> > of the
> > +; const int.
> > +  double __builtin_mffscrni (const int<2>);
> > +MFFSCRNI rs6000_mffscrni {}
> > +
> > +  double __builtin_mffscrnd (double);
> > +MFFSCRNF rs6000_mffscrn {}
> >  
> 
> Why are them put in [power9-64] rather than [power9]?  IMHO [power9]
> is the
> correct stanza for them.

Moved them to power 9 stanza.

>   Besides, {nosoft} attribute is required.

OK, added that.  I was trying to figure out why nosoft is needed.  The
instructions are manipulating bits in a physical register that controls
the hardware floating point instructions.  It looks to me like that
would be why.  Because if you were using msoft float then the floating
point HW registers are disabled and the floating point operations are
done using software.  Did I figure this out correctly?

 
> 
> >  ; Builtins requiring hardware support for IEEE-128 floating-point.
> >  [ieee128-hw]
> > diff --git a/gcc/config/rs6000/rs6000-overload.def
> > b/gcc/config/rs6000/rs6000-overload.def
> > index c582490c084..adda2df69ea 100644
> > --- a/gcc/config/rs6000/rs6000-overload.def
> > +++ b/gcc/config/rs6000/rs6000-overload.def
> > @@ -78,6 +78,11 @@
> >  ; like after a required newline, but nowhere else.  Lines
> > beginning with
> >  ; a semicolon are also treated as blank lines.
> >  
> > +[MFFSCR, __builtin_mffscrn, __builtin_mffscrn]
> > +  double __builtin_mffscrn (const int<2>);
> > +MFFSCRNI
> > +  double __builtin_mffscrn (double);
> > +MFFSCRNF
> >  
> >  [BCDADD, __builtin_bcdadd, __builtin_vec_bcdadd]
> >vsq __builtin_vec_bcdadd (vsq, vsq, const int);
> > diff --git

Re: [PATCH] rs6000: Fix __builtin_vec_xst_trunc definition

2023-05-18 Thread Carl Love via Gcc-patches

Peter:

On Thu, 2023-05-18 at 16:28 -0500, Peter Bergner wrote:
> 



> 
> > +  void __builtin_vec_xst_trunc (vsq, signed long long, signed long
> > *);
> > +TR_STXVRLX  TR_STXVRLX_S
> > +  void __builtin_vec_xst_trunc (vuq, signed long long, unsigned
> > long *);
> > +TR_STXVRLX  TR_STXVRLX_U
> 
> Not a comment on these two changes, and not a request to expand this
> specific patch, but I believe I saw other built-ins that were missing
> signed long */unsigned long * versions where they could/should accept
> them.  Can you double-check whether there are other built-ins that
> need similar changes and if so, please post a separate patch to fix
> those as well?  Thanks.

OK, I will put that on my to do list to go look for that in other
builtins.  

 Carl

[PATCH v2] rs6000: Add buildin for mffscrn instructions

2023-05-18 Thread Carl Love via Gcc-patches

GCC maintainers:

version 2.  Fixed an issue with the test case.  The dg-options line was
missing.

The following patch adds an overloaded builtin.  There are two possible
arguments for the builtin.  The builtin definitions are:

  double __builtin_mffscrn (unsigned long int);
  double __builtin_mffscrn (double);

The patch has been tested on Power 10 with no regressions.  

Please let me know if the patch is acceptable for mainline.  Thanks.

Carl


rs6000: Add buildin for mffscrn instructions

This patch adds overloaded __builtin_mffscrn for the move From FPSCR
Control & Set R instruction with an immediate argument.  It also adds the
builtin with a floating point register argument.  A new runnable test is
added for the new builtin.

gcc/

* config/rs6000/rs6000-builtins.def (__builtin_mffscrni,
__builtin_mffscrnd): Add builtin definitions.
* config/rs6000/rs6000-overload.def (__builtin_mffscrn): Add
overloaded definition.
* doc/extend.texi: Add documentation for __builtin_mffscrn.

gcc/testsuite/

* gcc.target/powerpc/builtin-mffscrn.c: Add testcase for new
builtin.
---
 gcc/config/rs6000/rs6000-builtins.def |   7 ++
 gcc/config/rs6000/rs6000-overload.def |   5 +
 gcc/doc/extend.texi   |   8 ++
 .../gcc.target/powerpc/builtin-mffscrn.c  | 106 ++
 4 files changed, 126 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/builtin-mffscrn.c

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 92d9b46e1b9..67125473684 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -2875,6 +2875,13 @@
   pure vsc __builtin_vsx_xl_len_r (void *, signed long);
 XL_LEN_R xl_len_r {}
 
+; Immediate instruction only uses the least significant two bits of the
+; const int.
+  double __builtin_mffscrni (const int<2>);
+MFFSCRNI rs6000_mffscrni {}
+
+  double __builtin_mffscrnd (double);
+MFFSCRNF rs6000_mffscrn {}
 
 ; Builtins requiring hardware support for IEEE-128 floating-point.
 [ieee128-hw]
diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def
index c582490c084..adda2df69ea 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -78,6 +78,11 @@
 ; like after a required newline, but nowhere else.  Lines beginning with
 ; a semicolon are also treated as blank lines.
 
+[MFFSCR, __builtin_mffscrn, __builtin_mffscrn]
+  double __builtin_mffscrn (const int<2>);
+MFFSCRNI
+  double __builtin_mffscrn (double);
+MFFSCRNF
 
 [BCDADD, __builtin_bcdadd, __builtin_vec_bcdadd]
   vsq __builtin_vec_bcdadd (vsq, vsq, const int);
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index ed8b9c8a87b..f16c046051a 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -18455,6 +18455,9 @@ int __builtin_dfp_dtstsfi_ov_td (unsigned int 
comparison, _Decimal128 value);
 
 double __builtin_mffsl(void);
 
+double __builtin_mffscrn (unsigned long int);
+double __builtin_mffscrn (double);
+
 @end smallexample
 The @code{__builtin_byte_in_set} function requires a
 64-bit environment supporting ISA 3.0 or later.  This function returns
@@ -18511,6 +18514,11 @@ the FPSCR.  The instruction is a lower latency version 
of the @code{mffs}
 instruction.  If the @code{mffsl} instruction is not available, then the
 builtin uses the older @code{mffs} instruction to read the FPSCR.
 
+The @code{__builtin_mffscrn} returns the contents of the control bits in the
+FPSCR, bits 29:31 (DRN) and bits 56:63 (VE, OE, UE, ZE, XE, NI, RN).  The
+contents of bits [62:63] of the unsigned long int or double argument are placed
+into bits [62:63] of the FPSCR (RN).
+
 @node Basic PowerPC Built-in Functions Available on ISA 3.1
 @subsubsection Basic PowerPC Built-in Functions Available on ISA 3.1
 
diff --git a/gcc/testsuite/gcc.target/powerpc/builtin-mffscrn.c 
b/gcc/testsuite/gcc.target/powerpc/builtin-mffscrn.c
new file mode 100644
index 000..26c666a4091
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/builtin-mffscrn.c
@@ -0,0 +1,106 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p9vector_hw } */
+/* { dg-options "-mpower9-vector -mdejagnu-cpu=power9" } */
+
+#include 
+
+#ifdef DEBUG
+#include 
+#endif
+
+#define MASK 0x3
+#define EXPECTED1 0x1
+#define EXPECTED2 0x2
+
+void abort (void);
+
+int
+main()
+{
+  unsigned long mask, result, expected;
+  double double_arg;
+  
+  union convert_t {
+double d;
+unsigned long ul;
+  } val;
+
+  /* Test immediate version of __builtin_mffscrn. */
+  /* Read FPSCR and set RN bits in FPSCR[62:63]. */
+  val.d = __builtin_mffscrn (EXPECTED2);
+
+  /* Read FPSCR, bits [62:63] should have been set to 0x2 by previous builtin
+ call.  */
+  val.d = __builtin_mffscrn (EXPECTED1);
+  /* The expected result is the

[PATCH] rs6000: Fix __builtin_vec_xst_trunc definition

2023-05-10 Thread Carl Love via Gcc-patches

GCC maintainers:

The following patch fixes errors in the arguments in the
__builtin_altivec_tr_stxvrhx,   __builtin_altivec_tr_stxvrwx builtin
definitions.  Note, these builtins are used by the overloaded
__builtin_vec_xst_trunc builtin.

The patch adds a new overloaded builtin definition for
__builtin_vec_xst_trunc for the third argument to be unsigned and
signed long int.

A new testcase is added for the various overloaded versions of
__builtin_vec_xst_trunc.

The patch has been tested on Power 10 with no new regressions.

Please let me know if the patch is acceptable for mainline.  Thanks.

Carl

---
rs6000: Fix __builtin_vec_xst_trunc definition

Built-in __builtin_vec_xst_trunc calls __builtin_altivec_tr_stxvrhx
and __builtin_altivec_tr_stxvrwx to handle the short and word cases.  The
arguments for these two builtins are wrong.  This patch fixes the wrong
arguments for the builtins.

Additionally, the patch adds a new __builtin_vec_xst_trunc overloaded
version for the destination being signed or unsigned long int.

A runnable test case is added to test each of the overloaded definitions
of __builtin_vec_xst_tru

gcc/
* config/rs6000/builtins.def (__builtin_altivec_tr_stxvrhx,
__builtin_altivec_tr_stxvrwx): Fix type of second argument.
Add, definition for send argument to be signed long.
* config/rs6000/rs6000-overload.def (__builtin_vec_xst_trunc):
add definition with thrird arument signed and unsigned long.
* doc/extend.texi (__builtin_vec_xst_trunc): Add documentation for
new unsinged long and signed long versions.

gcc/testsuite/
* gcc.target/powerpc/vsx-builtin-vec_xst_trunc.c: New test case
for __builtin_vec_xst_trunc builtin.
---
 gcc/config/rs6000/rs6000-builtins.def |   7 +-
 gcc/config/rs6000/rs6000-overload.def |   4 +
 gcc/doc/extend.texi   |   2 +
 .../powerpc/vsx-builtin-vec_xst_trunc.c   | 217 ++
 4 files changed, 228 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vsx-builtin-vec_xst_trunc.c

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 638d0bc72ca..a378491b358 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -3161,12 +3161,15 @@
   void __builtin_altivec_tr_stxvrbx (vsq, signed long, signed char *);
 TR_STXVRBX vsx_stxvrbx {stvec}
 
-  void __builtin_altivec_tr_stxvrhx (vsq, signed long, signed int *);
+  void __builtin_altivec_tr_stxvrhx (vsq, signed long, signed short *);
 TR_STXVRHX vsx_stxvrhx {stvec}
 
-  void __builtin_altivec_tr_stxvrwx (vsq, signed long, signed short *);
+  void __builtin_altivec_tr_stxvrwx (vsq, signed long, signed int *);
 TR_STXVRWX vsx_stxvrwx {stvec}
 
+  void __builtin_altivec_tr_stxvrlx (vsq, signed long, signed long *);
+TR_STXVRLX vsx_stxvrdx {stvec}
+
   void __builtin_altivec_tr_stxvrdx (vsq, signed long, signed long long *);
 TR_STXVRDX vsx_stxvrdx {stvec}
 
diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def
index c582490c084..54b7ae5e51b 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -4872,6 +4872,10 @@
 TR_STXVRWX  TR_STXVRWX_S
   void __builtin_vec_xst_trunc (vuq, signed long long, unsigned int *);
 TR_STXVRWX  TR_STXVRWX_U
+  void __builtin_vec_xst_trunc (vsq, signed long long, signed long *);
+TR_STXVRLX  TR_STXVRLX_S
+  void __builtin_vec_xst_trunc (vuq, signed long long, unsigned long *);
+TR_STXVRLX  TR_STXVRLX_U
   void __builtin_vec_xst_trunc (vsq, signed long long, signed long long *);
 TR_STXVRDX  TR_STXVRDX_S
   void __builtin_vec_xst_trunc (vuq, signed long long, unsigned long long *);
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index e426a2eb7d8..7e2ae790ab3 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -18570,10 +18570,12 @@ instructions.
 @defbuiltin{{void} vec_xst_trunc (vector signed __int128, signed long long, 
signed char *)}
 @defbuiltinx{{void} vec_xst_trunc (vector signed __int128, signed long long, 
signed short *)}
 @defbuiltinx{{void} vec_xst_trunc (vector signed __int128, signed long long, 
signed int *)}
+@defbuiltinx{{void} vec_xst_trunc (vector signed __int128, signed long long, 
signed long *)}
 @defbuiltinx{{void} vec_xst_trunc (vector signed __int128, signed long long, 
signed long long *)}
 @defbuiltinx{{void} vec_xst_trunc (vector unsigned __int128, signed long long, 
unsigned char *)}
 @defbuiltinx{{void} vec_xst_trunc (vector unsigned __int128, signed long long, 
unsigned short *)}
 @defbuiltinx{{void} vec_xst_trunc (vector unsigned __int128, signed long long, 
unsigned int *)}
+@defbuiltinx{{void} vec_xst_trunc (vector unsigned __int128, signed long long, 
unsigned long *)}
 @defbuiltinx{{void} vec_xst_trunc (vector unsigned

[PATCH] rs6000: vec_cmpne confusing implementation

2023-05-03 Thread Carl Love via Gcc-patches

GCC maintainers:

The following patch cleans up the definition for the
__builtin_altivec_vcmpnet.  The current implementation implies that the
builtin is only supported on Power 9 since it is defined under the
Power 9 stanza.  However the builtin has no ISA restrictions as stated
in the Power Vector Intrinsic Programming Reference document. The
current built-in works because the builtin gets replaced during GIMPLE
folding by a simple not-equal operator so it doesn't get expanded and
checked for Power 9 code generation.

This patch moves the definition to the Altivec stanza in the builtin
definition file.  The builtin then generates code for Power 8 and
earlier processors or Power 9 and later processors.

The patch has been tested on Power 8 and Power 9 with no regressions.

Please let me know if the patch is acceptable for mainline.  Thanks.

  Carl 

-
rs6000: vec_cmpne confusing implementation

__builtin_altivec_vcmpnet does not have any ISA restrictions.  The current
built-in definitions for vcmpneb, vcmpneh, vcmpnew, vcmpnet are defined
under the Power 9 section.  This implies they are only supported on Power
9 and above when in fact they are defined and work on Power 8.

This patch moves the definitions to the Altivec stanza and maps the
builtin dispatches to different code generation for Power 8 and earlier
or for Power 9 and later.

gcc/ChangeLog:

* config/rs6000/rs6000-builtins.def (vcmpneb, vcmpneh, vcmpnew.
vcmpnet): Move definitions to Altivec stanza.
* config/rs6000/vsx.md (cmpneb, vcmpneh, vcmpnew,   vcmpnet): New
define_expand.
(cmpneb, vcmpneh, vcmpnew, vcmpnet): Rename define_insn.

Patch has been tested on Power 8 and Power 9 with no regressions.
---
 gcc/config/rs6000/rs6000-builtins.def | 24 ++--
 gcc/config/rs6000/vsx.md  | 54 +--
 2 files changed, 63 insertions(+), 15 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 638d0bc72ca..adb4122be29 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -641,6 +641,18 @@
   const int __builtin_altivec_vcmpgtuw_p (int, vsi, vsi);
 VCMPGTUW_P vector_gtu_v4si_p {pred}
 
+  const vsc __builtin_altivec_vcmpneb (vsc, vsc);
+VCMPNEB vcmpneb {}
+
+  const vss __builtin_altivec_vcmpneh (vss, vss);
+VCMPNEH vcmpneh {}
+
+  const vsi __builtin_altivec_vcmpnew (vsi, vsi);
+VCMPNEW vcmpnew {}
+
+  const vbq __builtin_altivec_vcmpnet (vsq, vsq);
+VCMPNET vcmpnet {}
+
   const vsi __builtin_altivec_vctsxs (vf, const int<5>);
 VCTSXS altivec_vctsxs {}
 
@@ -2599,9 +2611,6 @@
   const signed int __builtin_altivec_vcmpaew_p (vsi, vsi);
 VCMPAEW_P vector_ae_v4si_p {}
 
-  const vsc __builtin_altivec_vcmpneb (vsc, vsc);
-VCMPNEB vcmpneb {}
-
   const signed int __builtin_altivec_vcmpneb_p (vsc, vsc);
 VCMPNEB_P vector_ne_v16qi_p {}
 
@@ -2614,15 +2623,9 @@
   const signed int __builtin_altivec_vcmpnefp_p (vf, vf);
 VCMPNEFP_P vector_ne_v4sf_p {}
 
-  const vss __builtin_altivec_vcmpneh (vss, vss);
-VCMPNEH vcmpneh {}
-
   const signed int __builtin_altivec_vcmpneh_p (vss, vss);
 VCMPNEH_P vector_ne_v8hi_p {}
 
-  const vsi __builtin_altivec_vcmpnew (vsi, vsi);
-VCMPNEW vcmpnew {}
-
   const signed int __builtin_altivec_vcmpnew_p (vsi, vsi);
 VCMPNEW_P vector_ne_v4si_p {}
 
@@ -3203,9 +3206,6 @@
   const signed int __builtin_altivec_vcmpgtut_p (signed int, vuq, vuq);
 VCMPGTUT_P vector_gtu_v1ti_p {pred}
 
-  const vbq __builtin_altivec_vcmpnet (vsq, vsq);
-VCMPNET vcmpnet {}
-
   const signed int __builtin_altivec_vcmpnet_p (vsq, vsq);
 VCMPNET_P vector_ne_v1ti_p {}
 
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 7d845df5c2d..3f05e3e6d00 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -5652,8 +5652,56 @@
   DONE;
 })
 
+;; Expand for builtin vcmpneb
+(define_expand "vcmpneb"
+  [(set (match_operand:V16QI 0 "altivec_register_operand" "=v")
+(not:V16QI
+  (eq:V16QI (match_operand:V16QI 1 "altivec_register_operand" "v")
+(match_operand:V16QI 2 "altivec_register_operand" "v"]
+  ""
+  {
+  if (TARGET_P9_VECTOR)
+emit_insn (gen_vcmpneb_p9 (operands[0], operands[1], operands[2]));
+  else
+emit_insn (gen_altivec_vcmpequb_p (operands[0], operands[1],
+  operands[2]));
+  DONE;
+  })
+
+;; Expand for builtin vcmpneh
+(define_expand "vcmpneh"
+  [(set (match_operand:V8HI 0 "altivec_register_operand" "=v")
+(not:V8HI
+  (eq:V8HI (match_operand:V8HI 1 "altivec_register_operand" "v")
+   (match_operand:V8HI 2 "altivec_register_operand" "v"]
+  ""
+  {
+  if (TARGET_P9_VECTOR)
+emit_insn (gen_vcmpneh_p9 (operands[0], operands[1], operands[2]));
+  else
+emit_insn (gen_altivec_vcmpequh_p

[PATCH] rs6000: Add builtins for IEEE 128-bit floating point values

2023-05-02 Thread Carl Love via Gcc-patches

GCC maintainers:

The following patch adds three buitins for inserting and extracting the
exponent and significand for an IEEE 128-bit floating point values. 
The builtins are valid for Power 9 and Power 10.  

The patch has been tested on both Power 9 and Power 10.

Please let me know if this patch is acceptable for mainline.  Thanks.

Carl 


--
>From a20cc81f98cce1140fc95775a7c25b55d1ca7cba Mon Sep 17 00:00:00 2001
From: Carl Love 
Date: Wed, 12 Apr 2023 17:46:37 -0400
Subject: [PATCH] rs6000: Add builtins for IEEE 128-bit floating point values

Add support for the following builtins:

 __vector unsigned long long int __builtin_extractf128_exp (__ieee128);
 __vector unsigned __int128 __builtin_extractf128_sig (__ieee128);
 __ieee128 __builtin_insertf128_exp (__vector unsigned __int128,
 __vector unsigned long long);

gcc/
* config/rs6000/rs6000-buildin.def (__builtin_extractf128_exp,
 __builtin_extractf128_sig, __builtin_insertf128_exp): Add new
builtin definitions.
* config/rs6000.md (extractf128_exp_, insertf128_exp_,
extractf128_sig_): Add define_expand for new builtins.
(xsxexpqp_f128_, xsxsigqp_f128_, siexpqpf_f128_):
Add define_insn for new builtins.
* doc/extend.texi (__builtin_extractf128_exp, __builtin_extractf128_sig,
__builtin_insertf128_exp): Add documentation for new builtins.

gcc/testsuite/
* gcc.target/powerpc/bfp/extract-exp-ieee128.c: New test case.
* gcc.target/powerpc/bfp/extract-sig-ieee128.c: New test case.
* gcc.target/powerpc/bfp/insert-exp-ieee128.c: New test case.
---
 gcc/config/rs6000/rs6000-builtins.def |  9 +++
 gcc/config/rs6000/vsx.md  | 66 ++-
 gcc/doc/extend.texi   | 28 
 .../powerpc/bfp/extract-exp-ieee128.c | 49 ++
 .../powerpc/bfp/extract-sig-ieee128.c | 56 
 .../powerpc/bfp/insert-exp-ieee128.c  | 58 
 6 files changed, 265 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/bfp/extract-exp-ieee128.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/bfp/extract-sig-ieee128.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/bfp/insert-exp-ieee128.c

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 638d0bc72ca..3247a7f7673 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -2876,6 +2876,15 @@
   pure vsc __builtin_vsx_xl_len_r (void *, signed long);
 XL_LEN_R xl_len_r {}
 
+  vull __builtin_extractf128_exp (_Float128);
+EEXPKF extractf128_exp_kf {}
+
+  vuq __builtin_extractf128_sig (_Float128);
+ESIGKF extractf128_sig_kf {}
+
+  _Float128 __builtin_insertf128_exp (vuq, vull);
+IEXPKF_VULL insertf128_exp_kf {}
+
 
 ; Builtins requiring hardware support for IEEE-128 floating-point.
 [ieee128-hw]
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 7d845df5c2d..2a9f875ba57 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -369,7 +369,10 @@
UNSPEC_XXSPLTI32DX
UNSPEC_XXBLEND
UNSPEC_XXPERMX
-  ])
+   UNSPEC_EXTRACTEXPIEEE
+   UNSPEC_EXTRACTSIGIEEE
+   UNSPEC_INSERTEXPIEEE
+])
 
 (define_int_iterator XVCVBF16  [UNSPEC_VSX_XVCVSPBF16
 UNSPEC_VSX_XVCVBF16SPN])
@@ -4155,6 +4158,38 @@
  "vinsrx %0,%1,%2"
  [(set_attr "type" "vecsimple")])
 
+(define_expand "extractf128_exp_"
+  [(set (match_operand:V2DI 0 "altivec_register_operand")
+  (unspec:IEEE128 [(match_operand:IEEE128 1 "altivec_register_operand")]
+ UNSPEC_EXTRACTEXPIEEE))]
+"TARGET_P9_VECTOR"
+{
+  emit_insn (gen_xsxexpqp_f128_ (operands[0], operands[1]));
+  DONE;
+})
+
+(define_expand "insertf128_exp_"
+  [(set (match_operand:IEEE128 0 "altivec_register_operand")
+  (unspec:IEEE128 [(match_operand:V1TI 1 "altivec_register_operand")
+  (match_operand:V2DI 2 "altivec_register_operand")]
+ UNSPEC_INSERTEXPIEEE))]
+"TARGET_P9_VECTOR"
+{
+  emit_insn (gen_xsiexpqpf_f128_ (operands[0], operands[1],
+   operands[2]));
+  DONE;
+})
+
+(define_expand "extractf128_sig_"
+  [(set (match_operand:V2DI 0 "altivec_register_operand")
+  (unspec:IEEE128 [(match_operand:IEEE128 1 "altivec_register_operand")]
+ UNSPEC_EXTRACTSIGIEEE))]
+"TARGET_P9_VECTOR"
+{
+  emit_insn (gen_xsxsigqp_f128_ (operands[0], operands[1]));
+  DONE;
+})
+
 (define_expand "vreplace_elt_"
   [(set (match_operand:REPLACE_ELT 0 "register_operand")
   (unspec:REPLACE_ELT [(match_operand:REPLACE_ELT 1 "register_operand")
@@ -5016,6 +5051,15 @@
   "xsxexpqp %0,%1"
   [(set_attr "type" "vecmove")])
 
+;; VSX Scalar to Vector Extract Exponent IEEE 128-bit floating point format
+(define_insn "xsxexpqp_f128_"
+  [(set (match_operand:V2DI 0

[PATCH] rs6000: Add buildin for mffscrn instructions

2023-04-13 Thread Carl Love via Gcc-patches



GCC maintainers:

The following patch adds an overloaded builtin.  There are two possible
arguments for the builtin.  The builtin definitions are:

  double __builtin_mffscrn (unsigned long int);
  double __builtin_mffscrn (double);

The patch has been tested on Power 10 with no regressions.  

Please let me know if the patch is acceptable for mainline.  Thanks.

Carl 

---
rs6000: Add buildin for mffscrn instructions

This patch adds overloaded __builtin_mffscrn for the move From FPSCR
Control & Set R instruction with an immediate argument.  It also adds the
builtin with a floating point register argument.  A new runnable test is
added for the new builtin.

gcc/

* config/rs6000/rs6000-builtins.def (__builtin_mffscrni,
__builtin_mffscrnd): Add builtin definitions.
* config/rs6000/rs6000-overload.def (__builtin_mffscrn): Add
overloaded definition.
* doc/extend.texi: Add documentation for __builtin_mffscrn.

gcc/testsuite/

* gcc.target/powerpc/builtin-mffscrn.c: Add testcase for new
builtin.
---
 gcc/config/rs6000/rs6000-builtins.def |   7 ++
 gcc/config/rs6000/rs6000-overload.def |   5 +
 gcc/doc/extend.texi   |   8 ++
 .../gcc.target/powerpc/builtin-mffscrn.c  | 105 ++
 4 files changed, 125 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/builtin-mffscrn.c

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 03fb194b151..6247cb6c0fe 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -2863,6 +2863,13 @@
   pure vsc __builtin_vsx_xl_len_r (void *, signed long);
 XL_LEN_R xl_len_r {}
 
+; Immediate instruction only uses the least significant two bits of the
+; const int.
+  double __builtin_mffscrni (const int<2>);
+MFFSCRNI rs6000_mffscrni {}
+
+  double __builtin_mffscrnd (double);
+MFFSCRNF rs6000_mffscrn {}
 
 ; Builtins requiring hardware support for IEEE-128 floating-point.
 [ieee128-hw]
diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def
index c582490c084..adda2df69ea 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -78,6 +78,11 @@
 ; like after a required newline, but nowhere else.  Lines beginning with
 ; a semicolon are also treated as blank lines.
 
+[MFFSCR, __builtin_mffscrn, __builtin_mffscrn]
+  double __builtin_mffscrn (const int<2>);
+MFFSCRNI
+  double __builtin_mffscrn (double);
+MFFSCRNF
 
 [BCDADD, __builtin_bcdadd, __builtin_vec_bcdadd]
   vsq __builtin_vec_bcdadd (vsq, vsq, const int);
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 3adb67aa47a..168d439c0e4 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -18317,6 +18317,9 @@ int __builtin_dfp_dtstsfi_ov_td (unsigned int 
comparison, _Decimal128 value);
 
 double __builtin_mffsl(void);
 
+double __builtin_mffscrn (unsigned long int);
+double __builtin_mffscrn (double);
+
 @end smallexample
 The @code{__builtin_byte_in_set} function requires a
 64-bit environment supporting ISA 3.0 or later.  This function returns
@@ -18373,6 +18376,11 @@ the FPSCR.  The instruction is a lower latency version 
of the @code{mffs}
 instruction.  If the @code{mffsl} instruction is not available, then the
 builtin uses the older @code{mffs} instruction to read the FPSCR.
 
+The @code{__builtin_mffscrn} returns the contents of the control bits in the
+FPSCR, bits 29:31 (DRN) and bits 56:63 (VE, OE, UE, ZE, XE, NI, RN).  The
+contents of bits [62:63] of the unsigned long int or double argument are placed
+into bits [62:63] of the FPSCR (RN).
+
 @node Basic PowerPC Built-in Functions Available on ISA 3.1
 @subsubsection Basic PowerPC Built-in Functions Available on ISA 3.1
 
diff --git a/gcc/testsuite/gcc.target/powerpc/builtin-mffscrn.c 
b/gcc/testsuite/gcc.target/powerpc/builtin-mffscrn.c
new file mode 100644
index 000..433a9081499
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/builtin-mffscrn.c
@@ -0,0 +1,105 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p9vector_hw } */
+
+#include 
+
+#ifdef DEBUG
+#include 
+#endif
+
+#define MASK 0x3
+#define EXPECTED1 0x1
+#define EXPECTED2 0x2
+
+void abort (void);
+
+int
+main()
+{
+  unsigned long mask, result, expected;
+  double double_arg;
+  
+  union convert_t {
+double d;
+unsigned long ul;
+  } val;
+
+  /* Test immediate version of __builtin_mffscrn. */
+  /* Read FPSCR and set RN bits in FPSCR[62:63]. */
+  val.d = __builtin_mffscrn (EXPECTED2);
+
+  /* Read FPSCR, bits [62:63] should have been set to 0x2 by previous builtin
+ call.  */
+  val.d = __builtin_mffscrn (EXPECTED1);
+  /* The expected result is the argument for the previous call to
+ __builtin_mffscrn.  */
+  expected = EXPECTED2;
+  result = MASK & val.ul;
+
+  if (EXPECTED2 != result)
+#ifdef

[PATCH] rs6000: Fix test gc.target/powerpc/rs600-fpint.c test options

2023-04-13 Thread Carl Love via Gcc-patches



GCC maintainers:

The following patch fixes the dg-options for test powerpc/rs600-
fpint.c.  The test now works correctly on Power 10.  The patch has been
tested on Power10 with no regressions.

Please let me know if the patch is acceptable for mainline.  Thanks.

Carl 

-
rs6000: Fix test gc.target/powerpc/rs600-fpint.c test options

The test compile option rs6000-*-* is outdated and no longer supported.
The powerpc*-*-* is the defualt, so it doesn't need to be specified.
The dg-options needs to specify an older processor to get the desired
behavior on recent processors.

This patch updates the test specifications so the test will run properly on
Power10LE.  Tested on Power10 LE system with no regression test failures.

gcc/testsuite/:
* gcc.target/powerpc/rs6000-fpint.c: Update dg-options, drop dg-do
compile specifier.
---
 gcc/testsuite/gcc.target/powerpc/rs6000-fpint.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/rs6000-fpint.c 
b/gcc/testsuite/gcc.target/powerpc/rs6000-fpint.c
index 410f780de8b..fdb0a371929 100644
--- a/gcc/testsuite/gcc.target/powerpc/rs6000-fpint.c
+++ b/gcc/testsuite/gcc.target/powerpc/rs6000-fpint.c
@@ -1,5 +1,4 @@
-/* { dg-do compile { target powerpc*-*-* rs6000-*-* } } */
-/* { dg-options "-mno-powerpc-gfxopt" } */
+/* { dg-options "-mno-powerpc-gfxopt -mdejagnu-cpu=power6" } */
 /* { dg-final { scan-assembler-not "stfiwx" } } */
 
 /* A basic test of the old-style (not stfiwx) fp -> int conversion.  */
-- 
2.37.2

[PATCH] rs6000: Fix test int_128bit-runnable.c instruction counts

2023-04-13 Thread Carl Love via Gcc-patches

GCC maintainers:

The following fix updates the expected instruction counts for the 
test int_128bit-runnable.c test.  The counts changed as a result of a
commit to support 128-bit integer divide and modulus.  The change
resulted in two of the tests using vdivsq instructions rather than the 
vextsd2q instruction.  This increased the counts for the vdivsq from 1
to three and the counts for the vextsd2q instruction from 6 to 4.

The patch has been tested on a Power10 system with no new regression
failures.

Please let me know if this patch is acceptable for mainline.  Thanks.

 Carl 



rs6000: Fix test int_128bit-runnable.c instruction counts

The test reports two failures on Power 10LE:

FAIL: .../int_128bit-runnable.c scan-assembler-times mvdivsqM 1
FAIL: .../int_128bit-runnable.c scan-assembler-times mvextsd2qM 6

The current counts are :

  vdivsq   3
  vextsd2q 4

The counts changed with commit:

  commit 852b11da11a181df517c0348df044354ff0656d6
  Author: Michael Meissner 
  Date:   Wed Jul 7 21:55:38 2021 -0400

  Generate 128-bit int divide/modulus on power10.

  This patch adds support for the VDIVSQ, VDIVUQ, VMODSQ, and VMODUQ
  instructions to do 128-bit arithmetic.

  2021-07-07  Michael Meissner  

The code generation changed significantly.  There are two places where
the vextsd2q is "replaced" by a vdivsq instruction thus increasing the
vdivsq count from 1 to 3.  The first case is:

expected_result = vec_arg1[0]/4;
1af8:   60 01 df e8 ld  r6,352(r31)
1afc:   68 01 ff e8 ld  r7,360(r31)
1b00:   76 fe e9 7c sradi   r9,r7,63
1b04:   67 4b 00 7c mtvsrdd vs32,0,r9
1b08:   02 06 1b 10 vextsd2q v0,v0 <
1b0c:   03 00 40 39 li  r10,3
1b10:   00 00 60 39 li  r11,0
1b14:   67 00 09 7c mfvrd   r9,v0
1b18:   67 02 08 7c mfvsrld r8,vs32
1b1c:   38 50 08 7d and r8,r8,r10
1b20:   38 58 29 7d and r9,r9,r11
1b24:   78 4b 2b 7d mr  r11,r9
1b28:   78 43 0a 7d mr  r10,r8
1b2c:   14 30 4a 7f addcr26,r10,r6
1b30:   14 39 6b 7f adder27,r11,r7
1b34:   46 f0 69 7b sldir9,r27,62
1b38:   82 f0 58 7b srdir24,r26,2
1b3c:   78 c3 38 7d or  r24,r9,r24
1b40:   74 16 79 7f sradi   r25,r27,2
1b44:   30 00 1f fb std r24,48(r31)
1b48:   38 00 3f fb std r25,56(r31)

To:

   expected_result = vec_arg1[0]/4;
1af8:   69 01 1f f4 lxv vs32,352(r31)
1afc:   04 00 20 39 li  r9,4
1b00:   00 00 40 39 li  r10,0
1b04:   67 4b 2a 7c mtvsrdd vs33,r10,r9
1b08:   0b 09 00 10 vdivsq  v0,v0,v1   <
1b0c:   3d 00 1f f4 stxvvs32,48(r31)

The second case were a vexts2q instruction is replaced with vdivsq:

From:

  expected_result = arg1/16;
1c24:   40 00 df e8 ld  r6,64(r31)
1c28:   48 00 ff e8 ld  r7,72(r31)
1c2c:   76 fe e9 7c sradi   r9,r7,63
1c30:   67 4b 00 7c mtvsrdd vs32,0,r9
1c34:   02 06 1b 10 vextsd2q v0,v0<---
1c38:   0f 00 40 39 li  r10,15
1c3c:   00 00 60 39 li  r11,0
1c40:   67 00 09 7c mfvrd   r9,v0
1c44:   67 02 08 7c mfvsrld r8,vs32
1c48:   38 50 08 7d and r8,r8,r10
1c4c:   38 58 29 7d and r9,r9,r11
1c50:   78 4b 2b 7d mr  r11,r9
1c54:   78 43 0a 7d mr  r10,r8
1c58:   14 30 ca 7e addcr22,r10,r6
1c5c:   14 39 eb 7e adder23,r11,r7
1c60:   c6 e0 e9 7a sldir9,r23,60
1c64:   02 e1 d4 7a srdir20,r22,4
1c68:   78 a3 34 7d or  r20,r9,r20
1c6c:   74 26 f5 7e sradi   r21,r23,4
1c70:   30 00 9f fa std r20,48(r31)
1c74:   38 00 bf fa std r21,56(r31)

To:

  expected_result = arg1/16;
1be8:   49 00 1f f4 lxv vs32,64(r31)
1bec:   10 00 20 39 li  r9,16
1bf0:   00 00 40 39 li  r10,0
1bf4:   67 4b 2a 7c mtvsrdd vs33,r10,r9
1bf8:   0b 09 00 10 vdivsq  v0,v0,v1   <---
1bfc:   3d 00 1f f4 stxvvs32,48(r31)

The patch has been tested on Power10LE with no regressions.

gcc/testsuite/
* gcc.target/powerpc/int_128bit-runnable.c: Update expected
instruction counts.
---
 gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
index 1afb00262a1..b2e2da1e013 100644
--- a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
+++

[PATCH] Fix for vcmpequt builtin

2021-06-14 Thread Carl Love via Gcc-patches

GCC Maintainers:

The following patch removes some unused BU_P10_OVERLOAD_2 entries. 
Also, it fixes the builtin definition to use the altivec_eqv1ti
instruction definition for the compare instruction.

The patch has been tested on powerpc64le-linux (Power 10 LE)

Please let me know if the patch is acceptable for mainline. 

Carl Love

---

The vcmpequt builtin define eqvv1ti3 points to the eqv define instruction for
the eqv instruction.  The vcmpequt builtin define should point to the 
altivec_eqv1ti
instruction definition for the vcmpequq instruction.

2021-06-14  Carl Love  

gcc/testsuite/ChangeLog
PR target/101022
* config/rs6000/rs6000-builtin.def (VCMPEQUT): Fix the ICODE for the
enum definition.
(VRLQ, VSLQ, VSRQ, VSRAQ): Remove unused BU_P10_OVERLOAD_2 definitions.
---
 gcc/config/rs6000/rs6000-builtin.def | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtin.def 
b/gcc/config/rs6000/rs6000-builtin.def
index 231e7c9d420..615ec1ede55 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -2924,7 +2924,7 @@ BU_P10V_VSX_2 (XXGENPCVM_V4SI, "xxgenpcvm_v4si", CONST, 
xxgenpcvm_v4si)
 BU_P10V_VSX_2 (XXGENPCVM_V2DI, "xxgenpcvm_v2di", CONST, xxgenpcvm_v2di)
 BU_P10V_AV_2 (VCMPGTUT,"vcmpgtut", CONST,  vector_gtuv1ti)
 BU_P10V_AV_2 (VCMPGTST,"vcmpgtst", CONST,  vector_gtv1ti)
-BU_P10V_AV_2 (VCMPEQUT,"vcmpequt", CONST,  eqvv1ti3)
+BU_P10V_AV_2 (VCMPEQUT,"vcmpequt", CONST,  altivec_eqv1ti)
 BU_P10V_AV_2 (CMPNET,  "vcmpnet",  CONST,  vcmpnet)
 BU_P10V_AV_2 (CMPGE_1TI,   "cmpge_1ti",CONST,  vector_nltv1ti)
 BU_P10V_AV_2 (CMPGE_U1TI,  "cmpge_u1ti",   CONST,  vector_nltuv1ti)
@@ -3078,10 +3078,6 @@ BU_P10_OVERLOAD_2 (CLRR, "clrr")
 BU_P10_OVERLOAD_2 (GNB, "gnb")
 BU_P10_OVERLOAD_4 (XXEVAL, "xxeval")
 BU_P10_OVERLOAD_2 (XXGENPCVM, "xxgenpcvm")
-BU_P10_OVERLOAD_2 (VRLQ, "vrlq")
-BU_P10_OVERLOAD_2 (VSLQ, "vslq")
-BU_P10_OVERLOAD_2 (VSRQ, "vsrq")
-BU_P10_OVERLOAD_2 (VSRAQ, "vsraq")
 
 BU_P10_OVERLOAD_3 (EXTRACTL, "extractl")
 BU_P10_OVERLOAD_3 (EXTRACTH, "extracth")
-- 
2.27.0

[PATCH] Fix effective target for check-builtin-vec_rlnm-runnable.c test

2021-06-11 Thread Carl Love via Gcc-patches

GCC maintainers:

The gcc test suite compiles and attempts to run the check-builtin-
vec_rlnm-runnable.c test on Power 8 platforms.  The test should only be
run on Power 9 and newer platforms.  The attached patch fixes the
target for the executable test so it only runs on Power 9 and newer
platforms.

The patch was tested on powerpc64-linux instead (Power 8 BE).  The test
harness reports 1 unsupported test.

The patch was also tested on:

powerpc64le-linux instead (Power 9 LE)
powerpc64le-linux instead (Power 10 LE)

The test harness reports 3 expected passes and no failures.

Please let me know if the patch looks OK for mainline.  Thanks.

Carl 

-


The effective target for a Power 9 runnable test should be
p9vector_hw.

2021-06-11  Carl Love  

gcc/testsuite/ChangeLog

* gcc.target/powerpc/check-builtin-vec_rlnm-runnable.c 
(dg-require-effective-target):
Change target to p9vector_hw.
---
 .../gcc.target/powerpc/check-builtin-vec_rlnm-runnable.c| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/check-builtin-vec_rlnm-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/check-builtin-vec_rlnm-runnable.c
index cd67b06afbe..55935eaafd2 100644
--- a/gcc/testsuite/gcc.target/powerpc/check-builtin-vec_rlnm-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/check-builtin-vec_rlnm-runnable.c
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-require-effective-target p9vector_hw } */
 /* { dg-options "-O2 -mdejagnu-cpu=power9 -save-temps" } */
 
 /* Verify the vec_rlm and vec_rlmi builtins works correctly.  */
-- 
2.17.1

Re: [PATCH 1/5 ver4] RS6000: Add 128-bit Integer Operations

2021-04-28 Thread Carl Love via Gcc-patches

On Tue, 2021-04-27 at 18:46 -0500, will schmidt wrote:
> On Mon, 2021-04-26 at 09:35 -0700, Carl Love wrote:
> > Will, Segher:
> > 
> > This patch fixes the order of the argument in the vec_rlmi and
> > vec_rlnm builtins.  The patch also adds a new test cases to verify
> > the fix.
> > 
> > The patch has been tested on
> >  powerpc64-linux instead (Power 8 BE)
> >  powerpc64-linux instead (Power 9 LE)
> >  powerpc64-linux instead (Power 10 LE)
> > 
> > Please let me know if the patch is acceptable for mainline.
> > 
> > Carl Love
> 
> Hi,
> 
> Is there an existing PR for this one? 

Will:

Not that I am aware of.  I found the issue as part of doing the
patches.  

 Carl

[PATCH ] RS6000 Add 128-bit Binary Integer sign extend operations

2021-04-28 Thread Carl Love via Gcc-patches

Segher, Will:

The agreement for the sign extension builtin was to just make it Endian
aware rather then go with a more complex definition.  The prior patch
has been updated with this new functionality.

This patch adds support for the 128-bit extension instruction and
corresponding builtin support for the various sign extensions.

This was originally part of the Add 128-bit Integer operations patch
series.  The patch logically goes with the earlier 5 patch series.

The LE support testing was done on a Power 10.  The regression testing
for LE passes with no regressions.  

The BE support testing was done by generating the BE code sequence and
then manually using gdb and visual inspection to make sure the elements
were correctly reversed and the expected elements were sign extended.

Please let me know if the patch is acceptable for mainline.

   Carl Love

---

gcc/ChangeLog

2021-04-28  Carl Love  
* config/rs6000/altivec.h (vec_signextll, vec_signexti, vec_signextq):
Add define for new builtins.
* config/rs6000/altivec.md(altivec_vreveti2): Add define_expand.
* config/rs6000/rs6000-builtin.def (VSIGNEXTI, VSIGNEXTLL):  Add
overloaded builtin definitions.
(VSIGNEXTSB2W, VSIGNEXTSH2W, VSIGNEXTSB2D, VSIGNEXTSH2D,VSIGNEXTSW2D,
VSIGNEXTSD2Q):  Add builtin expansions.
(SIGNEXT): Add P10 overload definition.
* config/rs6000-call.c (P9V_BUILTIN_VEC_VSIGNEXTI, 
P9V_BUILTIN_VEC_VSIGNEXTLL,
P10_BUILTIN_VEC_SIGNEXT): Add overloaded argument definitions.
* config/rs6000/vsx.md (vsx_sign_extend_v2di_v1ti): Add define_insn.
(vsignextend_v2di_v1ti, vsignextend_qi_, vsignextend_hi_,
vsignextend_si_v2di)[VIlong]: Add define_expand.
Make define_insn vsx_sign_extend_si_v2di visible.
* doc/extend.texi:  Add documentation for the vec_signexti,
vec_signextll builtins and vec_signextq.

gcc/testsuite/ChangeLog

2021-04-28  Carl Love  
* gcc.target/powerpc/int_128bit-runnable.c (extsd2q): Update expected
count.
Add tests for vec_signextq.
* gcc.target/powerpc/p9-sign_extend-runnable.c:  New test case.
---
 gcc/config/rs6000/altivec.h   |   3 +
 gcc/config/rs6000/altivec.md  |  24 
 gcc/config/rs6000/rs6000-builtin.def  |  12 ++
 gcc/config/rs6000/rs6000-call.c   |  16 +++
 gcc/config/rs6000/vsx.md  |  83 +++-
 gcc/doc/extend.texi   |  16 +++
 .../gcc.target/powerpc/int_128bit-runnable.c  |  41 +-
 .../powerpc/p9-sign_extend-runnable.c | 128 ++
 8 files changed, 321 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/p9-sign_extend-runnable.c

diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index 314695a43ca..5b631c7ebaf 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -497,6 +497,8 @@
 
 #define vec_xlx __builtin_vec_vextulx
 #define vec_xrx __builtin_vec_vexturx
+#define vec_signexti  __builtin_vec_vsignexti
+#define vec_signextll __builtin_vec_vsignextll
 
 #endif
 
@@ -715,6 +717,7 @@ __altivec_scalar_pred(vec_any_nle,
 #define vec_step(x) __builtin_vec_step (* (__typeof__ (x) *) 0)
 
 #ifdef _ARCH_PWR10
+#define vec_signextq  __builtin_vec_vsignextq
 #define vec_dive __builtin_vec_dive
 #define vec_mod  __builtin_vec_mod
 
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index c7d2cd0aa88..61a0905789f 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -4291,6 +4291,30 @@
 })
 
 ;; Vector reverse elements
+(define_expand "altivec_vreveti2"
+  [(set (match_operand:TI 0 "register_operand" "=v")
+   (unspec:TI [(match_operand:TI 1 "register_operand" "v")]
+ UNSPEC_VREVEV))]
+  "TARGET_ALTIVEC"
+{
+  int i, j, size, num_elements;
+  rtvec v = rtvec_alloc (16);
+  rtx mask = gen_reg_rtx (V16QImode);
+
+  size = GET_MODE_UNIT_SIZE (TImode);
+  num_elements = GET_MODE_NUNITS (TImode);
+
+  for (j = 0; j < num_elements; j++)
+for (i = 0; i < size; i++)
+  RTVEC_ELT (v, i + j * size)
+   = GEN_INT (i + (num_elements - 1 - j) * size);
+
+  emit_insn (gen_vec_initv16qiqi (mask, gen_rtx_PARALLEL (V16QImode, v)));
+  emit_insn (gen_altivec_vperm_ti (operands[0], operands[1],
+operands[1], mask));
+  DONE;
+})
+
 (define_expand "altivec_vreve2"
   [(set (match_operand:VEC_A 0 "register_operand" "=v")
(unspec:VEC_A [(match_operand:VEC_A 1 "register_operand" "v")]
diff --git a/gcc/config/rs6000/rs6000-builtin.def 
b/gcc/config/rs6000/rs6000-builtin.def
index dba22825b79..d55095b01bb 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -2877,6 +2877,8 @@ BU_P9V_OVERLOAD_1 (VPRTYBD,   "vprtybd")
 BU_P9V_OVERLOAD_1 (VPRTYBQ,"vprtybq")
 BU_P9V_OVERLOAD_1

[PATCH 5/5 ver4] RS6000: Conversions between 128-bit integer and floating point values.

2021-04-26 Thread Carl Love via Gcc-patches

Will, Segher:

This patch adds support for converting to/from 128-bit integers and
128-bit decimal floating point formats using the new P10 instructions
dcffixqq and dctfixqq.  The new instructions are only used on P10 HW,
otherwise the conversions continue to use the existing SW routines.

The files fixkfti-sw.c and fixunskfti-sw.c are renamed versions of
fixkfti.c and fixunskfti.c respectively.  The function names in the
files were updated with the rename as well as some white spaces fixes.

The patch has been tested on
powerpc64-linux instead (Power 8 BE)
powerpc64-linux instead (Power 9 LE)
powerpc64-linux instead (Power 10 LE)

Please let me know if the patch is acceptable for mainline.

   Carl Love


gcc/ChangeLog

2021-04-26  Carl Love  
* config/rs6000/rs6000.md (floatti2, floatunsti2,
fix_truncti2, fixuns_truncti2): Add
define_insn for mode IEEE 128.

gcc/testsuite/ChangeLog

2021-04-26  Carl Love  
* gcc.target/powerpc/fp128_conversions.c: New file.
* gcc.target/powerpc/int_128bit-runnable.c(vextsd2ppc_native_128bitq,
vcmpuq, vcmpsq, vcmpequq, vcmpequq., vcmpgtsq, vcmpgtsq.
vcmpgtuq, vcmpgtuq.): Update scan-assembler-times.
(ppc_native_128bit): Remove dg-require-effective-target.

libgcc/ChangeLog
2021-04-26  Carl Love  
* config.host: Add if test and set for
libgcc_cv_powerpc_3_1_float128_hw.
* libgcc/config/rs6000/fixkfti.c: Renamed to fixkfti-sw.c.
Change calls of __fixkfti to __fixkfti_sw.
* libgcc/config/rs6000/fixunskfti.c: Renamed to fixunskfti-sw.c.
Change calls of __fixunskfti to __fixunskfti_sw.
* libgcc/config/rs6000/float128-p10.c (__floattikf_hw,
__floatuntikf_hw, __fixkfti_hw, __fixunskfti_hw): New file.
* libgcc/config/rs6000/float128-ifunc.c (SW_OR_HW_ISA3_1): New macro.
(__floattikf_resolve, __floatuntikf_resolve, __fixkfti_resolve,
__fixunskfti_resolve): Add resolve functions.
(__floattikf, __floatuntikf, __fixkfti, __fixunskfti): New functions.
* libgcc/config/rs6000/float128-sed (floattitf, __floatuntitf,
__fixtfti, __fixunstfti): Add editor commands to change names.
* libgcc/config/rs6000/float128-sed-hw (__floattitf,
__floatuntitf, __fixtfti, __fixunstfti): Add editor commands to
change names.
* libgcc/config/rs6000/floattikf.c: Renamed to floattikf-sw.c.
* libgcc/config/rs6000/floatuntikf.c: Renamed to floatuntikf-sw.c.
* libgcc/config/rs6000/quaad-float128.h (__floattikf_sw,
__floatuntikf_sw, __fixkfti_sw, __fixunskfti_sw, __floattikf_hw,
__floatuntikf_hw, __fixkfti_hw, __fixunskfti_hw, __floattikf,
__floatuntikf, __fixkfti, __fixunskfti): New extern declarations.
* libgcc/config/rs6000/t-float128 (floattikf, floatuntikf,
fixkfti, fixunskfti): Remove file names from fp128_ppc_funcs.
(floattikf-sw, floatuntikf-sw, fixkfti-sw, fixunskfti-sw): Add
file names to fp128_ppc_funcs.
* libgcc/config/rs6000/t-float128-hw(fp128_3_1_hw_funcs,
fp128_3_1_hw_src, fp128_3_1_hw_static_obj, fp128_3_1_hw_shared_obj,
fp128_3_1_hw_obj): Add variables for ISA 3.1 support.
* libgcc/config/rs6000/t-float128-p10-hw: New file.
* configure: Update script for isa 3.1 128-bit float support.
* configure.ac: Add check for 128-bit float hardware support.
---
 gcc/config/rs6000/rs6000.c|   8 +-
 gcc/config/rs6000/rs6000.md   |  36 +++
 .../gcc.target/powerpc/fp128_conversions.c| 294 ++
 .../gcc.target/powerpc/int_128bit-runnable.c  |  13 +-
 libgcc/config.host|   4 +
 .../config/rs6000/{fixkfti.c => fixkfti-sw.c} |   4 +-
 .../rs6000/{fixunskfti.c => fixunskfti-sw.c}  |   4 +-
 libgcc/config/rs6000/float128-ifunc.c |  44 ++-
 libgcc/config/rs6000/float128-p10.c   |  71 +
 libgcc/config/rs6000/float128-sed |   4 +
 libgcc/config/rs6000/float128-sed-hw  |   4 +
 .../rs6000/{floattikf.c => floattikf-sw.c}|   4 +-
 .../{floatuntikf.c => floatuntikf-sw.c}   |   4 +-
 libgcc/config/rs6000/quad-float128.h  |  17 +-
 libgcc/config/rs6000/t-float128   |  12 +-
 libgcc/config/rs6000/t-float128-hw|  16 +
 libgcc/config/rs6000/t-float128-p10-hw|  24 ++
 libgcc/configure  |  37 +++
 libgcc/configure.ac   |  25 ++
 19 files changed, 589 insertions(+), 36 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/fp128_conversions.c
 rename libgcc/config/rs6000/{fixkfti.c => fixkfti-sw.c} (96%)
 rename libgcc/config/rs6000/{fixunskfti.c => fixunskfti-sw.c} (96%)
 create mode 100644 libgcc/config/rs6000/float128-p10.c
 rename libgcc/config/rs6000/{floattikf.c => floattikf-sw.c} (96%)
 rename

[PATCH 4/5 ver4] RS6000, Add test 128-bit shifts for just the int128 type.

2021-04-26 Thread Carl Love via Gcc-patches

Will, Segher:

The previous patch added the vector 128-bit integer shift instruction
support for the V1TI type.  This patch renames and moves the VSX_TI
iterator from vsx.md to VEC_TI in vector.md.  The uses of VEC_TI are
also updated.

The patch has been tested on
powerpc64-linux instead (Power 8 BE)
powerpc64-linux instead (Power 9 LE)
powerpc64-linux instead (Power 10 LE)

Please let me know if the patch is acceptable for mainline.

   Carl Love


gcc/ChangeLog

2021-04-26  Carl Love  
* config/rs6000/altivec.md (altivec_vslq, altivec_vsrq):
Rename to altivec_vslq_, altivec_vsrq_, mode VEC_TI.
* config/rs6000/vector.md (VEC_TI): Was named VSX_TI in vsx.md.
(vashlv1ti3): Change to vashl3, mode VEC_TI.
(vlshrv1ti3): Change to vlshr3, mode VEC_TI.
* config/rs6000/vsx.md (VSX_TI): Remove define_mode_iterator. Update
uses of VSX_TI to VEC_TI.

gcc/testsuite/ChangeLog

2021-04-26  Carl Love  
gcc.target/powerpc/int_128bit-runnable.c: Add shift_right, shift_left
tests.
---
 gcc/config/rs6000/altivec.md  | 16 -
 gcc/config/rs6000/vector.md   | 27 ---
 gcc/config/rs6000/vsx.md  | 33 +--
 .../gcc.target/powerpc/int_128bit-runnable.c  | 16 +++--
 4 files changed, 52 insertions(+), 40 deletions(-)

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index c4c82b33f8d..c7d2cd0aa88 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -2226,10 +2226,10 @@
   "vsl %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
-(define_insn "altivec_vslq"
-  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
-   (ashift:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
-(match_operand:V1TI 2 "vsx_register_operand" "v")))]
+(define_insn "altivec_vslq_"
+  [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=v")
+   (ashift:VEC_TI (match_operand:VEC_TI 1 "vsx_register_operand" "v")
+(match_operand:VEC_TI 2 "vsx_register_operand" "v")))]
   "TARGET_POWER10"
   /* Shift amount in needs to be in bits[57:63] of 128-bit operand. */
   "vslq %0,%1,%2"
@@ -2243,10 +2243,10 @@
   "vsr %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
-(define_insn "altivec_vsrq"
-  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
-   (lshiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
-  (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+(define_insn "altivec_vsrq_"
+  [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=v")
+   (lshiftrt:VEC_TI (match_operand:VEC_TI 1 "vsx_register_operand" "v")
+  (match_operand:VEC_TI 2 "vsx_register_operand" 
"v")))]
   "TARGET_POWER10"
   /* Shift amount in needs to be in bits[57:63] of 128-bit operand. */
   "vsrq %0,%1,%2"
diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
index 55bbaa9c32f..5695154e316 100644
--- a/gcc/config/rs6000/vector.md
+++ b/gcc/config/rs6000/vector.md
@@ -26,6 +26,9 @@
 ;; Vector int modes
 (define_mode_iterator VEC_I [V16QI V8HI V4SI V2DI])
 
+;; 128-bit int modes
+(define_mode_iterator VEC_TI [V1TI TI])
+
 ;; Vector int modes for parity
 (define_mode_iterator VEC_IP [V8HI
  V4SI
@@ -1627,17 +1630,17 @@
   "")
 
 ;; No immediate version of this 128-bit instruction
-(define_expand "vashlv1ti3"
-  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
-   (ashift:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
-(match_operand:V1TI 2 "vsx_register_operand" "v")))]
+(define_expand "vashl3"
+  [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=v")
+   (ashift:VEC_TI (match_operand:VEC_TI 1 "vsx_register_operand")
+(match_operand:VEC_TI 2 "vsx_register_operand")))]
   "TARGET_POWER10"
 {
   /* Shift amount in needs to be put in bits[57:63] of 128-bit operand2. */
-  rtx tmp = gen_reg_rtx (V1TImode);
+  rtx tmp = gen_reg_rtx (mode);
 
   emit_insn (gen_xxswapd_v1ti (tmp, operands[2]));
-  emit_insn (gen_altivec_vslq (operands[0], operands[1], tmp));
+  emit_insn(gen_altivec_vslq_ (operands[0], operands[1], tmp));
   DONE;
 })
 
@@ -1650,17 +1653,17 @@
   "")
 
 ;; No immediate version of this 128-bit instruction
-(define_expand "vlshrv1ti3"
-  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
-   (lshiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
-  (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+(define_expand "vlshr3"
+  [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=v")
+   (lshiftrt:VEC_TI (match_operand:VEC_TI 1 "vsx_register_operand")
+  (match_operand:VEC_TI 2 "vsx_register_operand")))]
   "TARGET_POWER10"
 {
   /* Shift amount in needs to be put into bits[57:63] of 128-bit

[PATCH 2/5 ver4] RS6000: Add 128-bit Integer Operations

2021-04-26 Thread Carl Love via Gcc-patches

Will, Segher:

This patch adds the 128-bit integer support for divide, modulo, shift,
compare of 128-bit integers instructions and builtin support.

The patch has been tested on
powerpc64-linux instead (Power 8 BE)
powerpc64-linux instead (Power 9 LE)
powerpc64-linux instead (Power 10 LE)

Please let me know if the patch is acceptable for mainline.

   Carl Love

---

gcc/ChangeLog

2021-04-26  Carl Love  
* config/rs6000/altivec.h (vec_dive, vec_mod): Add define for new
builtins.
* config/rs6000/altivec.md (UNSPEC_VMULEUD, UNSPEC_VMULESD,
UNSPEC_VMULOUD, UNSPEC_VMULOSD): New unspecs.
(altivec_eqv1ti, altivec_gtv1ti, altivec_gtuv1ti, altivec_vmuleud,
altivec_vmuloud, altivec_vmulesd, altivec_vmulosd, altivec_vrlq,
altivec_vrlqmi, altivec_vrlqmi_inst, altivec_vrlqnm,
altivec_vrlqnm_inst, altivec_vslq, altivec_vsrq, altivec_vsraq,
altivec_vcmpequt_p, altivec_vcmpgtst_p, altivec_vcmpgtut_p): New
define_insn.
(vec_widen_umult_even_v2di, vec_widen_smult_even_v2di,
vec_widen_umult_odd_v2di, vec_widen_smult_odd_v2di, altivec_vrlqmi,
altivec_vrlqnm): New define_expands.
* config/rs6000/rs6000-builtin.def (VCMPEQUT_P, VCMPGTST_P,
VCMPGTUT_P): Add macro expansions.
(BU_P10V_AV_P): Add builtin predicate definition.
(VCMPGTUT, VCMPGTST, VCMPEQUT, CMPNET, CMPGE_1TI,
CMPGE_U1TI, CMPLE_1TI, CMPLE_U1TI, VNOR_V1TI_UNS, VNOR_V1TI, VCMPNET_P,
VCMPAET_P, VMULEUD, VMULESD, VMULOUD, VMULOSD, VRLQ,
VSLQ, VSRQ, VSRAQ, VRLQNM, DIV_V1TI, UDIV_V1TI, DIVES_V1TI, DIVEU_V1TI,
MODS_V1TI, MODU_V1TI, VRLQMI): New macro expansions.
(VRLQ, VSLQ, VSRQ, VSRAQ, DIVE, MOD): New overload expansions.
* config/rs6000/rs6000-call.c (P10_BUILTIN_VCMPEQUT,
P10V_BUILTIN_CMPGE_1TI, P10V_BUILTIN_CMPGE_U1TI,
P10V_BUILTIN_VCMPGTUT, P10V_BUILTIN_VCMPGTST,
P10V_BUILTIN_CMPLE_1TI, P10V_BUILTIN_VCMPLE_U1TI,
P10V_BUILTIN_DIV_V1TI, P10V_BUILTIN_UDIV_V1TI,
P10V_BUILTIN_VMULESD, P10V_BUILTIN_VMULEUD,
P10V_BUILTIN_VMULOSD, P10V_BUILTIN_VMULOUD,
P10V_BUILTIN_VNOR_V1TI, P10V_BUILTIN_VNOR_V1TI_UNS,
P10V_BUILTIN_VRLQ, P10V_BUILTIN_VRLQMI,
P10V_BUILTIN_VRLQNM, P10V_BUILTIN_VSLQ,
P10V_BUILTIN_VSRQ, P10V_BUILTIN_VSRAQ,
P10V_BUILTIN_VCMPGTUT_P, P10V_BUILTIN_VCMPGTST_P,
P10V_BUILTIN_VCMPEQUT_P, P10V_BUILTIN_VCMPGTUT_P,
P10V_BUILTIN_VCMPGTST_P, P10V_BUILTIN_CMPNET,
P10V_BUILTIN_VCMPNET_P, P10V_BUILTIN_VCMPAET_P,
P10V_BUILTIN_DIVES_V1TI, P10V_BUILTIN_MODS_V1TI,
P10V_BUILTIN_MODU_V1TI):
New overloaded definitions.
(rs6000_gimple_fold_builtin) [P10V_BUILTIN_VCMPEQUT,
P10_BUILTIN_CMPNET, P10_BUILTIN_CMPGE_1TI,
P10_BUILTIN_CMPGE_U1TI, P10_BUILTIN_VCMPGTUT,
P10_BUILTIN_VCMPGTST, P10_BUILTIN_CMPLE_1TI,
P10_BUILTIN_CMPLE_U1TI]: New case statements.
(rs6000_init_builtins) [bool_V1TI_type_node, int_ftype_int_v1ti_v1ti]:
New assignments.
(altivec_init_builtins): New E_V1TImode case statement.
(builtin_function_type)[P10_BUILTIN_128BIT_VMULEUD,
P10_BUILTIN_128BIT_VMULOUD, P10_BUILTIN_128BIT_DIVEU_V1TI,
P10_BUILTIN_128BIT_MODU_V1TI, P10_BUILTIN_CMPGE_U1TI,
P10_BUILTIN_VCMPGTUT, P10_BUILTIN_VCMPEQUT]: New case statements.
* config/rs6000/r6000.c (rs6000_handle_altivec_attribute)[E_TImode,
E_V1TImode]: New case statements.
* config/rs6000/r6000.h (rs6000_builtin_type_index): New enum
value RS6000_BTI_bool_V1TI.
* config/rs6000/vector.md (vector_gtv1ti,vector_nltv1ti,
vector_gtuv1ti, vector_nltuv1ti, vector_ngtv1ti, vector_ngtuv1ti,
vector_eq_v1ti_p, vector_ne_v1ti_p, vector_ae_v1ti_p,
vector_gt_v1ti_p, vector_gtu_v1ti_p, vrotlv1ti3, vashlv1ti3,
vlshrv1ti3, vashrv1ti3): New define_expands.
* config/rs6000/vsx.md (UNSPEC_VSX_DIVSQ, UNSPEC_VSX_DIVUQ,
UNSPEC_VSX_DIVESQ, UNSPEC_VSX_DIVEUQ, UNSPEC_VSX_MODSQ,
UNSPEC_VSX_MODUQ): New unspecs.
(mulv2di3, vsx_div_v1ti, vsx_udiv_v1ti, vsx_dives_v1ti,
vsx_diveu_v1ti, vsx_mods_v1ti, vsx_modu_v1ti, xxswapd_v1ti): New
define_insns.
(vcmpnet): New define_expand.
* gcc/doc/extend.texi: Add documentation for the new builtins vec_rl,
vec_rlmi, vec_rlnm, vec_sl, vec_sr, vec_sra, vec_mule, vec_mulo,
vec_div, vec_dive, vec_mod, vec_cmpeq, vec_cmpne, vec_cmpgt, vec_cmplt,
vec_cmpge, vec_cmple, vec_all_eq, vec_all_ne, vec_all_gt, vec_all_lt,
vec_all_ge, vec_all_le, vec_any_eq, vec_any_ne, vec_any_gt, vec_any_lt,
vec_any_ge, vec_any_le.

gcc/testsuite/ChangeLog

2021-04-26 Carl Love  
* gcc.target/powerpc/int_128bit-runnable.c: New test file.
---
 gcc/config/rs6000/altivec.h

[PATCH 3/5 ver4] RS6000: Add TI to TD (128-bit DFP) and TD to TI support

2021-04-26 Thread Carl Love via Gcc-patches

Will, Segher:

This patch adds support for converting to/from 128-bit integers and
128-bit decimal floating point formats.

The patch has been tested on
powerpc64-linux instead (Power 8 BE)
powerpc64-linux instead (Power 9 LE)
powerpc64-linux instead (Power 10 LE)

Please let me know if the patch is acceptable for mainline.

   Carl Love


gcc/ChangeLog
dje@gmail.com, gcc-patches@gcc.gnu.org, Bill Schmidt 
, Peter Bergner ,  
2021-04-26  Carl Love  
* config/rs6000/dfp.md (floattitd2, fixtdti2): New define_insns.
* config/rs6000/rs6000-call.c (P10V_BUILTIN_VCMPNET_P,
P10V_BUILTIN_VCMPAET_P): New overloaded definitions.

gcc/testsuite/ChangeLog

2021-04-26  Carl Love  
* gcc.target/powerpc/int_128bit-runnable.c: Add 128-bit DFP
conversion tests.
---
 gcc/config/rs6000/dfp.md  | 14 +
 .../gcc.target/powerpc/int_128bit-runnable.c  | 61 +++
 2 files changed, 75 insertions(+)

diff --git a/gcc/config/rs6000/dfp.md b/gcc/config/rs6000/dfp.md
index 026be5d48a6..b89d5ecc91d 100644
--- a/gcc/config/rs6000/dfp.md
+++ b/gcc/config/rs6000/dfp.md
@@ -226,6 +226,13 @@
   [(set_attr "type" "dfp")
(set_attr "size" "128")])
 
+(define_insn "floattitd2"
+  [(set (match_operand:TD 0 "gpc_reg_operand" "=d")
+   (float:TD (match_operand:TI 1 "gpc_reg_operand" "v")))]
+  "TARGET_POWER10"
+  "dcffixqq %0,%1"
+  [(set_attr "type" "dfp")])
+
 ;; Convert a decimal64/128 to a decimal64/128 whose value is an integer.
 ;; This is the first stage of converting it to an integer type.
 
@@ -247,6 +254,13 @@
   "dctfix %0,%1"
   [(set_attr "type" "dfp")
(set_attr "size" "")])
+
+(define_insn "fixtdti2"
+  [(set (match_operand:TI 0 "gpc_reg_operand" "=v")
+   (fix:TI (match_operand:TD 1 "gpc_reg_operand" "d")))]
+  "TARGET_POWER10"
+  "dctfixqq %0,%1"
+  [(set_attr "type" "dfp")])
 
 ;; Decimal builtin support
 
diff --git a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
index 042758c8684..625b3869118 100644
--- a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
@@ -37,6 +37,7 @@
 #if DEBUG
 #include 
 #include 
+#include 
 
 
 void print_i128(__int128_t val)
@@ -58,6 +59,13 @@ int main ()
   __int128_t arg1, result;
   __uint128_t uarg2;
 
+  _Decimal128 arg1_dfp128, result_dfp128, expected_result_dfp128;
+
+  struct conv_t {
+__uint128_t u128;
+_Decimal128 d128;
+  } conv, conv2;
+
   vector signed long long int vec_arg1_di, vec_arg2_di;
   vector signed long long int vec_result_di, vec_expected_result_di;
   vector unsigned long long int vec_uarg1_di, vec_uarg2_di, vec_uarg3_di;
@@ -2258,6 +2266,59 @@ int main ()
 abort();
 #endif
   }
+  
+  /* DFP to __int128 and __int128 to DFP conversions */
+  /* Print the DFP value as an unsigned int so we can see the bit patterns.  */
+  conv.u128 = 0x2208ULL;
+  conv.u128 = (conv.u128 << 64) | 0x4ULL;   //DFP bit pattern for integer 4
+  expected_result_dfp128 = conv.d128;
 
+  arg1 = 4;
+
+  conv.d128 = (_Decimal128) arg1;
+
+  result_dfp128 = (_Decimal128) arg1;
+  if (((conv.u128 >>64) != 0x2208ULL) &&
+  ((conv.u128 & 0x) != 0x4ULL)) {
+#if DEBUG
+printf("ERROR:  convert int128 value ");
+print_i128 (arg1);
+conv.d128 = result_dfp128;
+printf("\nto DFP value 0x%llx %llx (printed as hex bit string) ",
+  (unsigned long long)((conv.u128) >>64),
+  (unsigned long long)((conv.u128) & 0x));
+
+conv.d128 = expected_result_dfp128;
+printf("\ndoes not match expected_result = 0x%llx %llx\n\n",
+  (unsigned long long) (conv.u128>>64),
+  (unsigned long long) (conv.u128 & 0x));
+#else
+abort();
+#endif
+  }
+
+  expected_result = 4;
+
+  conv.u128 = 0x2208ULL;
+  conv.u128 = (conv.u128 << 64) | 0x4ULL;  // 4 as DFP
+  arg1_dfp128 = conv.d128;
+
+  result = (__int128_t) arg1_dfp128;
+
+  if (result != expected_result) {
+#if DEBUG
+printf("ERROR:  convert DFP value ");
+printf("0x%llx %llx (printed as hex bit string) ",
+  (unsigned long long)(conv.u128>>64),
+  (unsigned long long)(conv.u128 & 0x));
+printf("to __int128 value = ");
+print_i128 (result);
+printf("\ndoes not match expected_result = ");
+print_i128 (expected_result);
+printf("\n");
+#else
+abort();
+#endif
+  }
   return 0;
 }
-- 
2.27.0

[PATCH 1/5 ver4] RS6000: Add 128-bit Integer Operations

2021-04-26 Thread Carl Love via Gcc-patches

Will, Segher:

This patch fixes the order of the argument in the vec_rlmi and
vec_rlnm builtins.  The patch also adds a new test cases to verify
the fix.

The patch has been tested on
powerpc64-linux instead (Power 8 BE)
powerpc64-linux instead (Power 9 LE)
powerpc64-linux instead (Power 10 LE)

Please let me know if the patch is acceptable for mainline.

   Carl Love



2021-04-26  Carl Love  

gcc/
* config/rs6000/altivec.md (altivec_vrlmi): Fix
bug in argument generation.

gcc/testsuite/
gcc.target/powerpc/check-builtin-vec_rlnm-runnable.c:
New runnable test case.
gcc.target/powerpc/vec-rlmi-rlnm.c: Update scan assembler times
for xxlor instruction.
---
 gcc/config/rs6000/altivec.md  |   6 +-
 .../powerpc/check-builtin-vec_rlnm-runnable.c | 231 ++
 .../gcc.target/powerpc/vec-rlmi-rlnm.c|   2 +-
 3 files changed, 235 insertions(+), 4 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/powerpc/check-builtin-vec_rlnm-runnable.c

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 1351dafbc41..97dc9d2bda9 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -1987,12 +1987,12 @@
 
 (define_insn "altivec_vrlmi"
   [(set (match_operand:VIlong 0 "register_operand" "=v")
-(unspec:VIlong [(match_operand:VIlong 1 "register_operand" "0")
-   (match_operand:VIlong 2 "register_operand" "v")
+(unspec:VIlong [(match_operand:VIlong 1 "register_operand" "v")
+   (match_operand:VIlong 2 "register_operand" "0")
(match_operand:VIlong 3 "register_operand" "v")]
   UNSPEC_VRLMI))]
   "TARGET_P9_VECTOR"
-  "vrlmi %0,%2,%3"
+  "vrlmi %0,%1,%3"
   [(set_attr "type" "veclogical")])
 
 (define_insn "altivec_vrlnm"
diff --git a/gcc/testsuite/gcc.target/powerpc/check-builtin-vec_rlnm-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/check-builtin-vec_rlnm-runnable.c
new file mode 100644
index 000..be8f82d8a06
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/check-builtin-vec_rlnm-runnable.c
@@ -0,0 +1,231 @@
+/* { dg-do run } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9 -save-temps" } */
+
+/* Verify the vec_rlm and vec_rlmi builtins works correctly.  */
+/* { dg-final { scan-assembler-times {\mvrldmi\M} 1 } } */
+
+#include 
+
+#ifdef DEBUG
+#include 
+#include 
+#endif
+
+void abort (void);
+
+int main ()
+{
+  int i;
+
+  vector unsigned int vec_arg1_int, vec_arg2_int, vec_arg3_int;
+  vector unsigned int vec_result_int, vec_expected_result_int;
+  
+  vector unsigned long long int vec_arg1_di, vec_arg2_di, vec_arg3_di;
+  vector unsigned long long int vec_result_di, vec_expected_result_di;
+
+  unsigned int mask_begin, mask_end, shift;
+  unsigned long long int mask;
+
+/* Check vec int version of vec_rlmi builtin */
+  mask = 0;
+  mask_begin = 0;
+  mask_end   = 4;
+  shift = 16;
+
+  for (i = 0; i < 31; i++)
+if ((i >= mask_begin) && (i <= mask_end))
+  mask |= 0x8000ULL >> i;
+
+  for (i = 0; i < 4; i++) {
+vec_arg1_int[i] = 0x12345678 + i*0x;
+vec_arg2_int[i] = 0xA1B1CDEF;
+vec_arg3_int[i] = mask_begin << 16 | mask_end << 8 | shift;
+
+/* do rotate */
+vec_expected_result_int[i] =  ( vec_arg2_int[i] & ~mask) 
+  | ((vec_arg1_int[i] << shift) | (vec_arg1_int[i] >> (32-shift))) & mask;
+  
+  }
+
+  /* vec_rlmi(arg1, arg2, arg3)
+ result - rotate each element of arg2 left and inserting it into arg1 
+   element based on the mask specified in arg3.  The shift, mask
+   start and end is specified in arg3.  */
+  vec_result_int = vec_rlmi (vec_arg1_int, vec_arg2_int, vec_arg3_int);
+
+  for (i = 0; i < 4; i++) {
+if (vec_result_int[i] != vec_expected_result_int[i])
+#ifdef DEBUG
+  printf("ERROR: i = %d, vec_rlmi int result 0x%x, does not match "
+"expected result 0x%x\n", i, vec_result_int[i],
+vec_expected_result_int[i]);
+#else
+  abort();
+#endif
+}
+
+/* Check vec long long int version of vec_rlmi builtin */
+  mask = 0;
+  mask_begin = 0;
+  mask_end   = 4;
+  shift = 16;
+
+  for (i = 0; i < 31; i++)
+if ((i >= mask_begin) && (i <= mask_end))
+  mask |= 0x8000ULL >> i;
+
+  for (i = 0; i < 2; i++) {
+vec_arg1_di[i] = 0x12345678 + i*0x;
+vec_arg2_di[i] = 0xA1B1C1D1E1F12345;
+vec_arg3_di[i] = mask_begin << 16 | mask_end << 8 | shift;
+
+/* do rotate */
+vec_expected_result_di[i] =  ( vec_arg2_di[i] & ~mask) 
+  | ((vec_arg1_di[i] << shift) | (vec_arg1_di[i] >> (64-shift))) & mask;
+  }
+
+  /* vec_rlmi(arg1, arg2, arg3)
+ result - rotate each element of arg1 left and inserting it into arg2 
+   element of arg2 based on the mask specified in arg3.  The shift,

[PATCH 0/5 ver4] RS6000: Add 128-bit Integer Operations

2021-04-26 Thread Carl Love via Gcc-patches

Segher, Will:

Bill asked that I refresh thes 128-bit integer patch set and get it
reposted.  I have rebased the patches onto the latest mainline
respository.  I have also retested the patches on Power 8 BE, Power 9
LE and Power 10 LE hardware.

A request has been made last week to change the functionality of the
128-bit sign extension builtins.  I pulled the support for these
builtins out of the previous patch 2 and 3 and moved to a single last
patch.  That leaves five of the six patches ready for review. I will
update and post the last, sixth, patch later when we have full
agreement on what the new functionality will be. 

The following five patches have minor fixes to the builtin
descriptions, typo fixes, etc. from the previous iteration.  The patch
series numbers have changed.

  Carl Love

[PATCH 0/6 ver3] RS6000 add 128-bit Integer Operations

2021-01-19 Thread Carl Love via Gcc-patches

Segher, Will:

The following patch set is adds the 128-bit integer operation support
and fixes a bug found in the existing support.  This is the third
version of the patch set.  The first five patches have minor updates
based on previous reviews.  The last patch has a number of functional
changes to get the 128-bit conversion support to use the new hardware
instrucitons on Power 10.  The existing software support is used for
Power 9 and earlier platforms.

  Carl Love

[PATCH 6/6 ver 3] Conversions between 128-bit integer and floating point values.

2021-01-19 Thread Carl Love via Gcc-patches

Will, Segher:
 
This patch adds support for converting to/from 128-bit integers and
128-bit decimal floating point formats using the new P10 instructions
dcffixqq and dctfixqq.  The new instructions are only used on P10 HW,
otherwise the conversions continue to use the existing SW routines.

The files fixkfti-sw.c and fixunskfti-sw.c are renamed versions of
fixkfti.c and fixunskfti.c respectively.  The function names in the
files were updated with the rename as well as some white spaces fixes.

version 3:  Numerous changes with help/input from Michael Meissner

  Add assembler checks for the 128-bit conversion  instructions, see
  configure and configure.ac.

  Add the libgcc resolvers to select sw or hw support for the
conversions.

  Rename, rewrite the existing conversion files (fixkfti.c,
fixunskfti.c,
  floattikf.c, floatuntikf.c) to create the sw conversion files.

  Tested on Power 8BE, Power9, Power10.

version 2:

  Fixed a typo in the ChangeLog noted by Will.

  Removed the target ppc_native_128bit from the test case as we no
  longer have the 128-bit flag.

  Re-tested the patch on Power 9 with no regression errors.

Carl Love



gcc/ChangeLog

2021-01-15  Carl Love  
* config/rs6000/rs6000.md (floatti2, floatunsti2,
fix_truncti2, fixuns_truncti2): Add
define_insn for mode IEEE 128.

gcc/testsuite/ChangeLog

2021-01-15  Carl Love  
* gcc.target/powerpc/fp128_conversions.c: New file.
* gcc.target/powerpc/int_128bit-runnable.c(vextsd2ppc_native_128bitq,
vcmpuq, vcmpsq, vcmpequq, vcmpequq., vcmpgtsq, vcmpgtsq.
vcmpgtuq, vcmpgtuq.): Update scan-assembler-times.
(ppc_native_128bit): Remove dg-require-effective-target.

libgcc/ChangeLog
2021-01-15  Carl Love  
* config.host: Add if test and set for
libgcc_cv_powerpc_3_1_float128_hw.
* libgcc/config/rs6000/fixkfti.c: Renamed to fixkfti-sw.c.
Change calls of __fixkfti to __fixkfti_sw.
* libgcc/config/rs6000/fixunskfti.c: Renamed to fixunskfti-sw.c.
Change calls of __fixunskfti to __fixunskfti_sw.
* libgcc/config/rs6000/float128-p10.c (__floattikf_hw,
__floatuntikf_hw, __fixkfti_hw, __fixunskfti_hw): New file.
* libgcc/config/rs6000/float128-ifunc.c (SW_OR_HW_ISA3_1): New macro.
(__floattikf_resolve, __floatuntikf_resolve, __fixkfti_resolve,
__fixunskfti_resolve): Add resolve functions.
(__floattikf, __floatuntikf, __fixkfti, __fixunskfti): New functions.
* libgcc/config/rs6000/float128-sed (floattitf, __floatuntitf,
__fixtfti, __fixunstfti): Add editor commands to change names.
* libgcc/config/rs6000/float128-sed-hw (__floattitf,
__floatuntitf, __fixtfti, __fixunstfti): Add editor commands to
change names.
* libgcc/config/rs6000/floattikf.c: Renamed to floattikf-sw.c.
* libgcc/config/rs6000/floatuntikf.c: Renamed to floatuntikf-sw.c.
* libgcc/config/rs6000/quaad-float128.h (__floattikf_sw,
__floatuntikf_sw, __fixkfti_sw, __fixunskfti_sw, __floattikf_hw,
__floatuntikf_hw, __fixkfti_hw, __fixunskfti_hw, __floattikf,
__floatuntikf, __fixkfti, __fixunskfti): New extern declarations.
* libgcc/config/rs6000/t-float128 (floattikf, floatuntikf,
fixkfti, fixunskfti): Remove file names from fp128_ppc_funcs.
(floattikf-sw, floatuntikf-sw, fixkfti-sw, fixunskfti-sw): Add
file names to fp128_ppc_funcs.
* libgcc/config/rs6000/t-float128-hw(fp128_3_1_hw_funcs,
fp128_3_1_hw_src, fp128_3_1_hw_static_obj, fp128_3_1_hw_shared_obj,
fp128_3_1_hw_obj): Add variables for ISA 3.1 support.
* libgcc/config/rs6000/t-float128-p10-hw: New file.
* configure: Update script for isa 3.1 128-bit float support.
* configure.ac: Add check for 128-bit float hardware support.
---
 gcc/config/rs6000/rs6000.md   |  36 +++
 .../gcc.target/powerpc/fp128_conversions.c| 294 ++
 .../gcc.target/powerpc/int_128bit-runnable.c  |  14 +-
 libgcc/config.host|   4 +
 .../config/rs6000/{fixkfti.c => fixkfti-sw.c} |   4 +-
 .../rs6000/{fixunskfti.c => fixunskfti-sw.c}  |   4 +-
 libgcc/config/rs6000/float128-ifunc.c |  44 ++-
 libgcc/config/rs6000/float128-p10.c   |  71 +
 libgcc/config/rs6000/float128-sed |   4 +
 libgcc/config/rs6000/float128-sed-hw  |   4 +
 .../rs6000/{floattikf.c => floattikf-sw.c}|   4 +-
 .../{floatuntikf.c => floatuntikf-sw.c}   |   4 +-
 libgcc/config/rs6000/quad-float128.h  |  17 +-
 libgcc/config/rs6000/t-float128   |  12 +-
 libgcc/config/rs6000/t-float128-hw|  16 +
 libgcc/config/rs6000/t-float128-p10-hw|  24 ++
 libgcc/configure  |  39 ++-
 libgcc/configure.ac   |  25 ++
 18 files

[PATCH 3/6 ver 3] RS6000 add 128-bit Integer Operations part 1

2021-01-19 Thread Carl Love via Gcc-patches

Will, Segher:

This patch adds the 128-bit integer support for divide, modulo, shift,
compare of 128-bit integers instructions and builtin support.

version 3:

  int_128bit-runnable.c: Removed ppc_native_128bit from
 dg-require-effective-target.  Was missed from 
 an earlier cleanup.
  Tested on Power 8BE, Power9, Power10.
   
version 2:

  Fixed the references to 128-bit in ChangeLog that got missed in the
  last go round.

  Fixed missing spaces in emit_insn calls.

  Re-tested the patch on Power 9 with no regression errors.

Carl Love

--

gcc/ChangeLog

2021-01-12  Carl Love  
* config/rs6000/altivec.h (vec_signextq, vec_dive, vec_mod): Add define
for new builtins.
* config/rs6000/altivec.md (UNSPEC_VMULEUD, UNSPEC_VMULESD,
UNSPEC_VMULOUD, UNSPEC_VMULOSD): New unspecs.
(altivec_eqv1ti, altivec_gtv1ti, altivec_gtuv1ti, altivec_vmuleud,
altivec_vmuloud, altivec_vmulesd, altivec_vmulosd, altivec_vrlq,
altivec_vrlqmi, altivec_vrlqmi_inst, altivec_vrlqnm,
altivec_vrlqnm_inst, altivec_vslq, altivec_vsrq, altivec_vsraq,
altivec_vcmpequt_p, altivec_vcmpgtst_p, altivec_vcmpgtut_p): New
define_insn.
(vec_widen_umult_even_v2di, vec_widen_smult_even_v2di,
vec_widen_umult_odd_v2di, vec_widen_smult_odd_v2di, altivec_vrlqmi,
altivec_vrlqnm): New define_expands.
* config/rs6000/rs6000-builtin.def (VCMPEQUT_P, VCMPGTST_P,
VCMPGTUT_P): Add macro expansions.
(BU_P10V_AV_P): Add builtin predicate definition.
(VCMPGTUT, VCMPGTST, VCMPEQUT, CMPNET, CMPGE_1TI,
CMPGE_U1TI, CMPLE_1TI, CMPLE_U1TI, VNOR_V1TI_UNS, VNOR_V1TI, VCMPNET_P,
VCMPAET_P, VSIGNEXTSD2Q, VMULEUD, VMULESD, VMULOUD, VMULOSD, VRLQ,
VSLQ, VSRQ, VSRAQ, VRLQNM, DIV_V1TI, UDIV_V1TI, DIVES_V1TI, DIVEU_V1TI,
MODS_V1TI, MODU_V1TI, VRLQMI): New macro expansions.
(VRLQ, VSLQ, VSRQ, VSRAQ, DIVE, MOD, SIGNEXT): New overload expansions.
* config/rs6000/rs6000-call.c (P10_BUILTIN_VCMPEQUT,
P10V_BUILTIN_CMPGE_1TI, P10V_BUILTIN_CMPGE_U1TI,
P10V_BUILTIN_VCMPGTUT, P10V_BUILTIN_VCMPGTST,
P10V_BUILTIN_CMPLE_1TI, P10V_BUILTIN_VCMPLE_U1TI,
P10V_BUILTIN_DIV_V1TI, P10V_BUILTIN_UDIV_V1TI,
P10V_BUILTIN_VMULESD, P10V_BUILTIN_VMULEUD,
P10V_BUILTIN_VMULOSD, P10V_BUILTIN_VMULOUD,
P10V_BUILTIN_VNOR_V1TI, P10V_BUILTIN_VNOR_V1TI_UNS,
P10V_BUILTIN_VRLQ, P10V_BUILTIN_VRLQMI,
P10V_BUILTIN_VRLQNM, P10V_BUILTIN_VSLQ,
P10V_BUILTIN_VSRQ, P10V_BUILTIN_VSRAQ,
P10V_BUILTIN_VCMPGTUT_P, P10V_BUILTIN_VCMPGTST_P,
P10V_BUILTIN_VCMPEQUT_P, P10V_BUILTIN_VCMPGTUT_P,
P10V_BUILTIN_VCMPGTST_P, P10V_BUILTIN_CMPNET,
P10V_BUILTIN_VCMPNET_P, P10V_BUILTIN_VCMPAET_P,
P10V_BUILTIN_VSIGNEXTSD2Q, P10V_BUILTIN_DIVES_V1TI,
P10V_BUILTIN_MODS_V1TI, P10V_BUILTIN_MODU_V1TI):
New overloaded definitions.
(rs6000_gimple_fold_builtin) [P10V_BUILTIN_VCMPEQUT,
P10_BUILTIN_CMPNET, P10_BUILTIN_CMPGE_1TI,
P10_BUILTIN_CMPGE_U1TI, P10_BUILTIN_VCMPGTUT,
P10_BUILTIN_VCMPGTST, P10_BUILTIN_CMPLE_1TI,
P10_BUILTIN_CMPLE_U1TI]: New case statements.
(rs6000_init_builtins) [bool_V1TI_type_node, int_ftype_int_v1ti_v1ti]:
New assignments.
(altivec_init_builtins): New E_V1TImode case statement.
(builtin_function_type)[P10_BUILTIN_128BIT_VMULEUD,
P10_BUILTIN_128BIT_VMULOUD, P10_BUILTIN_128BIT_DIVEU_V1TI,
P10_BUILTIN_128BIT_MODU_V1TI, P10_BUILTIN_CMPGE_U1TI,
P10_BUILTIN_VCMPGTUT, P10_BUILTIN_VCMPEQUT]: New case statements.
* config/rs6000/r6000.c (rs6000_handle_altivec_attribute)[E_TImode,
E_V1TImode]: New case statements.
* config/rs6000/r6000.h (rs6000_builtin_type_index): New enum
value RS6000_BTI_bool_V1TI.
* config/rs6000/vector.md (vector_gtv1ti,vector_nltv1ti,
vector_gtuv1ti, vector_nltuv1ti, vector_ngtv1ti, vector_ngtuv1ti,
vector_eq_v1ti_p, vector_ne_v1ti_p, vector_ae_v1ti_p,
vector_gt_v1ti_p, vector_gtu_v1ti_p, vrotlv1ti3, vashlv1ti3,
vlshrv1ti3, vashrv1ti3): New define_expands.
* config/rs6000/vsx.md (UNSPEC_VSX_DIVSQ, UNSPEC_VSX_DIVUQ,
UNSPEC_VSX_DIVESQ, UNSPEC_VSX_DIVEUQ, UNSPEC_VSX_MODSQ,
UNSPEC_VSX_MODUQ): New unspecs.
(mulv2di3, vsx_div_v1ti, vsx_udiv_v1ti, vsx_dives_v1ti,
vsx_diveu_v1ti, vsx_mods_v1ti, vsx_modu_v1ti, xxswapd_v1ti,
vsx_sign_extend_v2di_v1ti): New define_insns.
(vcmpnet): New define_expand.
* gcc/doc/extend.texi: Add documentation for the new builtins vec_rl,
vec_rlmi, vec_rlnm, vec_sl, vec_sr, vec_sra, vec_mule, vec_mulo,
vec_div, vec_dive, vec_mod, vec_cmpeq, vec_cmpne, vec_cmpgt, vec_cmplt,

[PATCH 4/6 ver 3] Add TI to TD (128-bit DFP) and TD to TI support

2021-01-19 Thread Carl Love via Gcc-patches

Will, Segher:
 
This patch adds support for converting to/from 128-bit integers and
128-bit decimal floating point formats.

Version 3:

  No functional changes.
  Tested on Power 8BE, Power9, Power10.

Version 2:
  Updated ChangeLog comments.  Fixed up comments in the test program.

  Re-tested the patch on Power 9 with no regression errors.
   
Carl

---

gcc/ChangeLog

2021-01-12  Carl Love  
* config/rs6000/dfp.md (floattitd2, fixtdti2): New define_insns.
* config/rs6000/rs6000-call.c (P10V_BUILTIN_VCMPNET_P,
P10V_BUILTIN_VCMPAET_P): New overloaded definitions.

gcc/testsuite/ChangeLog

2021-01-12  Carl Love  
* gcc.target/powerpc/int_128bit-runnable.c: Add 128-bit DFP
conversion tests.
---
 gcc/config/rs6000/dfp.md  | 14 +
 .../gcc.target/powerpc/int_128bit-runnable.c  | 61 +++
 2 files changed, 75 insertions(+)

diff --git a/gcc/config/rs6000/dfp.md b/gcc/config/rs6000/dfp.md
index c8cdb645865..876ab2ed682 100644
--- a/gcc/config/rs6000/dfp.md
+++ b/gcc/config/rs6000/dfp.md
@@ -222,6 +222,13 @@
   "dcffixq %0,%1"
   [(set_attr "type" "dfp")])
 
+(define_insn "floattitd2"
+  [(set (match_operand:TD 0 "gpc_reg_operand" "=d")
+   (float:TD (match_operand:TI 1 "gpc_reg_operand" "v")))]
+  "TARGET_POWER10"
+  "dcffixqq %0,%1"
+  [(set_attr "type" "dfp")])
+
 ;; Convert a decimal64/128 to a decimal64/128 whose value is an integer.
 ;; This is the first stage of converting it to an integer type.
 
@@ -241,6 +248,13 @@
   "TARGET_DFP"
   "dctfix %0,%1"
   [(set_attr "type" "dfp")])
+
+(define_insn "fixtdti2"
+  [(set (match_operand:TI 0 "gpc_reg_operand" "=v")
+   (fix:TI (match_operand:TD 1 "gpc_reg_operand" "d")))]
+  "TARGET_POWER10"
+  "dctfixqq %0,%1"
+  [(set_attr "type" "dfp")])
 
 ;; Decimal builtin support
 
diff --git a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
index 3f8892b39d6..42cb91c7ba9 100644
--- a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
@@ -38,6 +38,7 @@
 #if DEBUG
 #include 
 #include 
+#include 
 
 
 void print_i128(__int128_t val)
@@ -59,6 +60,13 @@ int main ()
   __int128_t arg1, result;
   __uint128_t uarg2;
 
+  _Decimal128 arg1_dfp128, result_dfp128, expected_result_dfp128;
+
+  struct conv_t {
+__uint128_t u128;
+_Decimal128 d128;
+  } conv, conv2;
+
   vector signed long long int vec_arg1_di, vec_arg2_di;
   vector signed long long int vec_result_di, vec_expected_result_di;
   vector unsigned long long int vec_uarg1_di, vec_uarg2_di, vec_uarg3_di;
@@ -2296,6 +2304,59 @@ int main ()
 abort();
 #endif
   }
+  
+  /* DFP to __int128 and __int128 to DFP conversions */
+  /* Print the DFP value as an unsigned int so we can see the bit patterns.  */
+  conv.u128 = 0x2208ULL;
+  conv.u128 = (conv.u128 << 64) | 0x4ULL;   //DFP bit pattern for integer 4
+  expected_result_dfp128 = conv.d128;
 
+  arg1 = 4;
+
+  conv.d128 = (_Decimal128) arg1;
+
+  result_dfp128 = (_Decimal128) arg1;
+  if (((conv.u128 >>64) != 0x2208ULL) &&
+  ((conv.u128 & 0x) != 0x4ULL)) {
+#if DEBUG
+printf("ERROR:  convert int128 value ");
+print_i128 (arg1);
+conv.d128 = result_dfp128;
+printf("\nto DFP value 0x%llx %llx (printed as hex bit string) ",
+  (unsigned long long)((conv.u128) >>64),
+  (unsigned long long)((conv.u128) & 0x));
+
+conv.d128 = expected_result_dfp128;
+printf("\ndoes not match expected_result = 0x%llx %llx\n\n",
+  (unsigned long long) (conv.u128>>64),
+  (unsigned long long) (conv.u128 & 0x));
+#else
+abort();
+#endif
+  }
+
+  expected_result = 4;
+
+  conv.u128 = 0x2208ULL;
+  conv.u128 = (conv.u128 << 64) | 0x4ULL;  // 4 as DFP
+  arg1_dfp128 = conv.d128;
+
+  result = (__int128_t) arg1_dfp128;
+
+  if (result != expected_result) {
+#if DEBUG
+printf("ERROR:  convert DFP value ");
+printf("0x%llx %llx (printed as hex bit string) ",
+  (unsigned long long)(conv.u128>>64),
+  (unsigned long long)(conv.u128 & 0x));
+printf("to __int128 value = ");
+print_i128 (result);
+printf("\ndoes not match expected_result = ");
+print_i128 (expected_result);
+printf("\n");
+#else
+abort();
+#endif
+  }
   return 0;
 }
-- 
2.27.0

[PATCH 2/6 ver 3] RS6000 Add 128-bit Binary Integer sign extend operations

2021-01-19 Thread Carl Love via Gcc-patches

Will, Segher:

Patch 1, adds the 128-bit sign extension instruction support and
corresponding builtin support.

version 3:

  doc/extend.texi:  Fixed the "uThe" typo and added the colon at the
end of the line.

  p9-sign_extend-runnable.c: Changed the dg-do run to  *-*-linux 
 instead of powerpc*-*-linux.

  Tested on Power 8BE, Power9, Power10.

version 2:

  Removed the blank line per Will's latest feedback.

  Retested the patch on Power 9 with no regression errors.

Carl Love

--

gcc/ChangeLog

2021-01-12  Carl Love  
* config/rs6000/altivec.h (vec_signextll, vec_signexti): Add define
for new builtins.
* config/rs6000/rs6000-builtin.def (VSIGNEXTI, VSIGNEXTLL):  Add
overloaded builtin definitions.
(VSIGNEXTSB2W, VSIGNEXTSH2W, VSIGNEXTSB2D, VSIGNEXTSH2D,VSIGNEXTSW2D):
Add builtin expansions.
* config/rs6000-call.c (P9V_BUILTIN_VEC_VSIGNEXTI,
P9V_BUILTIN_VEC_VSIGNEXTLL): Add overloaded argument definitions.
* config/rs6000/vsx.md: Make define_insn vsx_sign_extend_si_v2di
visible.
* doc/extend.texi:  Add documentation for the vec_signexti and
vec_signextll builtins.

gcc/testsuite/ChangeLog

2021-01-12  Carl Love  
* gcc.target/powerpc/p9-sign_extend-runnable.c:  New test case.
---
 gcc/config/rs6000/altivec.h   |   2 +
 gcc/config/rs6000/rs6000-builtin.def  |   9 ++
 gcc/config/rs6000/rs6000-call.c   |  13 ++
 gcc/config/rs6000/vsx.md  |   2 +-
 gcc/doc/extend.texi   |  15 ++
 .../powerpc/p9-sign_extend-runnable.c | 128 ++
 6 files changed, 168 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/p9-sign_extend-runnable.c

diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index 06f0d4d9f14..460310a5132 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -497,6 +497,8 @@
 
 #define vec_xlx __builtin_vec_vextulx
 #define vec_xrx __builtin_vec_vexturx
+#define vec_signexti  __builtin_vec_vsignexti
+#define vec_signextll __builtin_vec_vsignextll
 
 #endif
 
diff --git a/gcc/config/rs6000/rs6000-builtin.def 
b/gcc/config/rs6000/rs6000-builtin.def
index 8aa31ad0a06..842f07196de 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -2800,6 +2800,8 @@ BU_P9V_OVERLOAD_1 (VPRTYBD,   "vprtybd")
 BU_P9V_OVERLOAD_1 (VPRTYBQ,"vprtybq")
 BU_P9V_OVERLOAD_1 (VPRTYBW,"vprtybw")
 BU_P9V_OVERLOAD_1 (VPARITY_LSBB,   "vparity_lsbb")
+BU_P9V_OVERLOAD_1 (VSIGNEXTI,  "vsignexti")
+BU_P9V_OVERLOAD_1 (VSIGNEXTLL, "vsignextll")
 
 /* 2 argument functions added in ISA 3.0 (power9).  */
 BU_P9_2 (CMPRB,"byte_in_range",CONST,  cmprb)
@@ -2811,6 +2813,13 @@ BU_P9_OVERLOAD_2 (CMPRB, "byte_in_range")
 BU_P9_OVERLOAD_2 (CMPRB2,  "byte_in_either_range")
 BU_P9_OVERLOAD_2 (CMPEQB,  "byte_in_set")
 
+
+BU_P9V_AV_1 (VSIGNEXTSB2W, "vsignextsb2w", CONST,  
vsx_sign_extend_qi_v4si)
+BU_P9V_AV_1 (VSIGNEXTSH2W, "vsignextsh2w", CONST,  
vsx_sign_extend_hi_v4si)
+BU_P9V_AV_1 (VSIGNEXTSB2D, "vsignextsb2d", CONST,  
vsx_sign_extend_qi_v2di)
+BU_P9V_AV_1 (VSIGNEXTSH2D, "vsignextsh2d", CONST,  
vsx_sign_extend_hi_v2di)
+BU_P9V_AV_1 (VSIGNEXTSW2D, "vsignextsw2d", CONST,  
vsx_sign_extend_si_v2di)
+
 /* Builtins for scalar instructions added in ISA 3.1 (power10).  */
 BU_P10_POWERPC64_MISC_2 (CFUGED, "cfuged", CONST, cfuged)
 BU_P10_POWERPC64_MISC_2 (CNTLZDM, "cntlzdm", CONST, cntlzdm)
diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index 2308cc8b4a2..3af325317a1 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -5660,6 +5660,19 @@ const struct altivec_builtin_types 
altivec_overloaded_builtins[] = {
 RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI,
 RS6000_BTI_INTSI, RS6000_BTI_INTSI },
 
+  /* Sign extend builtins that work work on ISA 3.0, not added until ISA 3.1 */
+  { P9V_BUILTIN_VEC_VSIGNEXTI, P9V_BUILTIN_VSIGNEXTSB2W,
+RS6000_BTI_V4SI, RS6000_BTI_V16QI, 0, 0 },
+  { P9V_BUILTIN_VEC_VSIGNEXTI, P9V_BUILTIN_VSIGNEXTSH2W,
+RS6000_BTI_V4SI, RS6000_BTI_V8HI, 0, 0 },
+
+  { P9V_BUILTIN_VEC_VSIGNEXTLL, P9V_BUILTIN_VSIGNEXTSB2D,
+RS6000_BTI_V2DI, RS6000_BTI_V16QI, 0, 0 },
+  { P9V_BUILTIN_VEC_VSIGNEXTLL, P9V_BUILTIN_VSIGNEXTSH2D,
+RS6000_BTI_V2DI, RS6000_BTI_V8HI, 0, 0 },
+  { P9V_BUILTIN_VEC_VSIGNEXTLL, P9V_BUILTIN_VSIGNEXTSW2D,
+RS6000_BTI_V2DI, RS6000_BTI_V4SI, 0, 0 },
+
   /* Overloaded built-in functions for ISA3.1 (power10). */
   { P10_BUILTIN_VEC_CLRL, P10V_BUILTIN_VCLRLB,
 RS6000_BTI_V16QI, RS6000_BTI_V16QI, RS6000_BTI_UINTSI, 0 },
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index

[PATCH 5/6 ver 3] rs6000, Add test 128-bit shifts for just the int128 type.

2021-01-19 Thread Carl Love via Gcc-patches

Will, Segher:

Patch 4 adds the vector 128-bit integer shift instruction support for
the V1TI type.  This patch also renames and moves the VSX_TI iterator
from vsx.md to VEC_TI in vector.md.  The uses of VEC_TI are also
updated.

This patch also renames and moves the VSX_TI iterator from vsx.md to
VEC_TI in vector.md.  The uses of VEC_TI are also updated.

version 3:
  No additional functional changes.
  Tested on Power 8BE, Power 9, Power 10.
  
version 2:
  Re-tested the patch on Power 9 with no regression errors.

Carl Love



gcc/ChangeLog

2021-01-12  Carl Love  
* config/rs6000/altivec.md (altivec_vslq, altivec_vsrq):
Rename to altivec_vslq_, altivec_vsrq_, mode VEC_TI.
* config/rs6000/vector.md (VEC_TI): Was named VSX_TI in vsx.md.
(vashlv1ti3): Change to vashl3, mode VEC_TI.
(vlshrv1ti3): Change to vlshr3, mode VEC_TI.
* config/rs6000/vsx.md (VSX_TI): Remove define_mode_iterator. Update
uses of VSX_TI to VEC_TI.

gcc/testsuite/ChangeLog

2021-01-12  Carl Love  
gcc.target/powerpc/int_128bit-runnable.c: Add shift_right, shift_left
tests.
---
 gcc/config/rs6000/altivec.md  | 16 -
 gcc/config/rs6000/vector.md   | 27 ---
 gcc/config/rs6000/vsx.md  | 33 +--
 .../gcc.target/powerpc/int_128bit-runnable.c  | 16 +++--
 4 files changed, 52 insertions(+), 40 deletions(-)

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index cb83c5ce012..61ab5c9afb6 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -2221,10 +2221,10 @@
   "vsl %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
-(define_insn "altivec_vslq"
-  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
-   (ashift:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
-(match_operand:V1TI 2 "vsx_register_operand" "v")))]
+(define_insn "altivec_vslq_"
+  [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=v")
+   (ashift:VEC_TI (match_operand:VEC_TI 1 "vsx_register_operand" "v")
+(match_operand:VEC_TI 2 "vsx_register_operand" "v")))]
   "TARGET_POWER10"
   /* Shift amount in needs to be in bits[57:63] of 128-bit operand. */
   "vslq %0,%1,%2"
@@ -2238,10 +2238,10 @@
   "vsr %0,%1,%2"
   [(set_attr "type" "vecsimple")])
 
-(define_insn "altivec_vsrq"
-  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
-   (lshiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
-  (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+(define_insn "altivec_vsrq_"
+  [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=v")
+   (lshiftrt:VEC_TI (match_operand:VEC_TI 1 "vsx_register_operand" "v")
+  (match_operand:VEC_TI 2 "vsx_register_operand" 
"v")))]
   "TARGET_POWER10"
   /* Shift amount in needs to be in bits[57:63] of 128-bit operand. */
   "vsrq %0,%1,%2"
diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
index 0f252c915b0..6a4cd69d866 100644
--- a/gcc/config/rs6000/vector.md
+++ b/gcc/config/rs6000/vector.md
@@ -26,6 +26,9 @@
 ;; Vector int modes
 (define_mode_iterator VEC_I [V16QI V8HI V4SI V2DI])
 
+;; 128-bit int modes
+(define_mode_iterator VEC_TI [V1TI TI])
+
 ;; Vector int modes for parity
 (define_mode_iterator VEC_IP [V8HI
  V4SI
@@ -1627,17 +1630,17 @@
   "")
 
 ;; No immediate version of this 128-bit instruction
-(define_expand "vashlv1ti3"
-  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
-   (ashift:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
-(match_operand:V1TI 2 "vsx_register_operand" "v")))]
+(define_expand "vashl3"
+  [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=v")
+   (ashift:VEC_TI (match_operand:VEC_TI 1 "vsx_register_operand")
+(match_operand:VEC_TI 2 "vsx_register_operand")))]
   "TARGET_POWER10"
 {
   /* Shift amount in needs to be put in bits[57:63] of 128-bit operand2. */
-  rtx tmp = gen_reg_rtx (V1TImode);
+  rtx tmp = gen_reg_rtx (mode);
 
   emit_insn (gen_xxswapd_v1ti (tmp, operands[2]));
-  emit_insn (gen_altivec_vslq (operands[0], operands[1], tmp));
+  emit_insn(gen_altivec_vslq_ (operands[0], operands[1], tmp));
   DONE;
 })
 
@@ -1650,17 +1653,17 @@
   "")
 
 ;; No immediate version of this 128-bit instruction
-(define_expand "vlshrv1ti3"
-  [(set (match_operand:V1TI 0 "vsx_register_operand" "=v")
-   (lshiftrt:V1TI (match_operand:V1TI 1 "vsx_register_operand" "v")
-  (match_operand:V1TI 2 "vsx_register_operand" "v")))]
+(define_expand "vlshr3"
+  [(set (match_operand:VEC_TI 0 "vsx_register_operand" "=v")
+   (lshiftrt:VEC_TI (match_operand:VEC_TI 1 "vsx_register_operand")
+  (match_operand:VEC_TI 2 "vsx_register_operand")))]

[PATCH 1/6 ver 3] rs6000, Fix arguments in altivec_vrlwmi and altivec_rlwdi builtins

2021-01-19 Thread Carl Love via Gcc-patches

Will, Segher:

This patch fixes the order of the argument in the vec_rlmi and
vec_rlnm builtins.  The patch also adds a new test cases to verify
the fix.

The patch has been tested on
powerpc64-linux instead (Power 8 BE)
powerpc64-linux instead (Power 9 LE)
powerpc64-linux instead (Power 10 LE)

Please let me know if the patch is acceptable for mainline.

   Carl Love

--

gcc/ChangeLog

2021-01-12  Carl Love  

gcc/
* config/rs6000/altivec.md (altivec_vrlmi): Fix
bug in argument generation.

gcc/testsuite/
gcc.target/powerpc/check-builtin-vec_rlnm-runnable.c:
New runnable test case.
gcc.target/powerpc/vec-rlmi-rlnm.c: Update scan assembler times
for xxlor instruction.
---
 gcc/config/rs6000/altivec.md  |   6 +-
 .../powerpc/check-builtin-vec_rlnm-runnable.c | 233 ++
 .../gcc.target/powerpc/vec-rlmi-rlnm.c|   2 +-
 3 files changed, 237 insertions(+), 4 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/powerpc/check-builtin-vec_rlnm-runnable.c

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index fc19a8fc807..4d08cca2228 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -1982,12 +1982,12 @@
 
 (define_insn "altivec_vrlmi"
   [(set (match_operand:VIlong 0 "register_operand" "=v")
-(unspec:VIlong [(match_operand:VIlong 1 "register_operand" "0")
-   (match_operand:VIlong 2 "register_operand" "v")
+(unspec:VIlong [(match_operand:VIlong 1 "register_operand" "v")
+   (match_operand:VIlong 2 "register_operand" "0")
(match_operand:VIlong 3 "register_operand" "v")]
   UNSPEC_VRLMI))]
   "TARGET_P9_VECTOR"
-  "vrlmi %0,%2,%3"
+  "vrlmi %0,%1,%3"
   [(set_attr "type" "veclogical")])
 
 (define_insn "altivec_vrlnm"
diff --git a/gcc/testsuite/gcc.target/powerpc/check-builtin-vec_rlnm-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/check-builtin-vec_rlnm-runnable.c
new file mode 100644
index 000..b97bc519c87
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/check-builtin-vec_rlnm-runnable.c
@@ -0,0 +1,233 @@
+/* { dg-do run } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9 -save-temps" } */
+
+/* Verify the vec_rlm and vec_rlmi builtins works correctly.  */
+/* { dg-final { scan-assembler-times {\mvrldmi\M} 1 } } */
+
+#include 
+
+#define DEBUG 1
+
+#if DEBUG
+#include 
+#include 
+#endif
+
+void abort (void);
+
+int main ()
+{
+  int i;
+
+  vector unsigned int vec_arg1_int, vec_arg2_int, vec_arg3_int;
+  vector unsigned int vec_result_int, vec_expected_result_int;
+  
+  vector unsigned long long int vec_arg1_di, vec_arg2_di, vec_arg3_di;
+  vector unsigned long long int vec_result_di, vec_expected_result_di;
+
+  unsigned int mask_begin, mask_end, shift;
+  unsigned long long int mask;
+
+/* Check vec int version of vec_rlmi builtin */
+  mask = 0;
+  mask_begin = 0;
+  mask_end   = 4;
+  shift = 16;
+
+  for (i = 0; i < 31; i++)
+if ((i >= mask_begin) && (i <= mask_end))
+  mask |= 0x8000ULL >> i;
+
+  for (i = 0; i < 4; i++) {
+vec_arg1_int[i] = 0x12345678 + i*0x;
+vec_arg2_int[i] = 0xA1B1CDEF;
+vec_arg3_int[i] = mask_begin << 16 | mask_end << 8 | shift;
+
+/* do rotate */
+vec_expected_result_int[i] =  ( vec_arg2_int[i] & ~mask) 
+  | ((vec_arg1_int[i] << shift) | (vec_arg1_int[i] >> (32-shift))) & mask;
+  
+  }
+
+  /* vec_rlmi(arg1, arg2, arg3)
+ result - rotate each element of arg1 left and inserting it into arg2 
+   element of arg2 based on the mask specified in arg3.  The shift, mask
+   start and end is specified in arg3.  */
+  vec_result_int = vec_rlmi (vec_arg1_int, vec_arg2_int, vec_arg3_int);
+
+  for (i = 0; i < 4; i++) {
+if (vec_result_int[i] != vec_expected_result_int[i])
+#if DEBUG
+  printf("ERROR: i = %d, vec_rlmi int result 0x%x, does not match "
+"expected result 0x%x\n", i, vec_result_int[i],
+vec_expected_result_int[i]);
+#else
+  abort();
+#endif
+}
+
+/* Check vec long long int version of vec_rlmi builtin */
+  mask = 0;
+  mask_begin = 0;
+  mask_end   = 4;
+  shift = 16;
+
+  for (i = 0; i < 31; i++)
+if ((i >= mask_begin) && (i <= mask_end))
+  mask |= 0x8000ULL >> i;
+
+  for (i = 0; i < 2; i++) {
+vec_arg1_di[i] = 0x12345678 + i*0x;
+vec_arg2_di[i] = 0xA1B1C1D1E1F12345;
+vec_arg3_di[i] = mask_begin << 16 | mask_end << 8 | shift;
+
+/* do rotate */
+vec_expected_result_di[i] =  ( vec_arg2_di[i] & ~mask) 
+  | ((vec_arg1_di[i] << shift) | (vec_arg1_di[i] >> (64-shift))) & mask;
+  }
+
+  /* vec_rlmi(arg1, arg2, arg3)
+ result - rotate each element of arg1 left and inserting it into arg2 
+   element of arg2 based on the

[PATCH v5] rs6000, vector integer multiply/divide/modulo instructions

2021-01-13 Thread Carl Love via Gcc-patches

Will:

I have addressed the various typos you mentioned in the messages to the
maintainers.

Per your comment I have also tested the updated patch on Power 8 BE.

The patch was compiled and tested on:

   powerpc64le-unknown-linux-gnu (Power 8 BE)
   powerpc64le-unknown-linux-gnu (Power 9 LE)
   powerpc64le-unknown-linux-gnu (Power 10 LE)

I have fixed the change log entries.

I have fixed the formatting/white space issues you mentioned.

With regards to the comment:

> Presumably it is safe (no side affects) when adding V4SI and V2DI here,
> with respect to other current users of 'bits'.
> Is it worth adding the
> other modes while we are here? (V1TI, V8HI, V16QI ).

I did not add the additional modes.  I don't see any reason it would
hurt but feel it is best to only add them when they are needed.

 Carl 

---

Will:

I have addressed you comments with regards to the Change Log entries.  

The extra define vec_div was removed.

Added the missing entries for DIVU_V2DI  DIVS_V2DI in rs6000-call.c.

The extra MULLD_V2DI case statement entry was removed.

Added comment in rs6000.md about size for vector types per discussion
with Pat.

  Carl


GCC maintainers:

The following patch adds new builtins for the vector integer multiply,
divide and modulo operations.  The builtins are: vec_mulh(),
vec_dive(), vec_mod() for signed and unsigned integers and long
long integers. The existing support for the vec_div() and vec_mul()
builtins emulate the vector operations with multiple scalar
instructions.  This patch adds support for these builtins using the new
vector instructions for Power 10.

The patch was compiled and tested on:

  powerpc64le-unknown-linux-gnu (Power 9 LE)
  powerpc64le-unknown-linux-gnu (Power 10 LE)

with no regressions. Additionally the new test case was compiled and
executed by hand on Mambo to verify the test case passes.

Please let me know if this patch is acceptable for mainline.  Thanks.

Carl Love

---

2021-01-12  Carl Love  

gcc/
* config/rs6000/altivec.h (vec_mulh, vec_div, vec_dive, vec_mod): New
defines.
* config/rs6000/altivec.md (VIlong): Move define to file vsx.md.
* config/rs6000/rs6000-builtin.def (DIVES_V4SI, DIVES_V2DI,
DIVEU_V4SI, DIVEU_V2DI, DIVS_V4SI, DIVS_V2DI, DIVU_V4SI,
DIVU_V2DI, MODS_V2DI, MODS_V4SI, MODU_V2DI, MODU_V4SI,
MULHS_V2DI, MULHS_V4SI, MULHU_V2DI, MULHU_V4SI, MULLD_V2DI):
Add builtin define.
(MULH, DIVE, MOD):  Add new BU_P10_OVERLOAD_2 definitions.
* config/rs6000/rs6000-call.c (VSX_BUILTIN_VEC_DIV,
VSX_BUILTIN_VEC_DIVE, P10_BUILTIN_VEC_MOD, P10_BUILTIN_VEC_MULH):
New overloaded definitions.
(builtin_function_type) [P10V_BUILTIN_DIVEU_V4SI,
P10V_BUILTIN_DIVEU_V2DI, P10V_BUILTIN_DIVU_V4SI,
P10V_BUILTIN_DIVU_V2DI, P10V_BUILTIN_MODU_V2DI,
P10V_BUILTIN_MODU_V4SI, P10V_BUILTIN_MULHU_V2DI,
P10V_BUILTIN_MULHU_V4SI]: Add case
statement for builtins.
* config/rs6000/rs6000.md (bits): Add new attribute sizes V4SI, V2DI.
* config/rs6000/vsx.md (VIlong): Moved from config/rs6000/altivec.md.
(UNSPEC_VDIVES, UNSPEC_VDIVEU): New unspec definitions.
(vsx_mul_v2di): Add if TARGET_POWER10 statement.
(vsx_udiv_v2di): Add if TARGET_POWER10 statement.
(dives_, diveu_, div3, uvdiv3,
mods_, modu_, mulhs_, mulhu_, mulv2di3):
Add define_insn, mode is VIlong.
doc/extend.texi (vec_mulh, vec_mul, vec_div, vec_dive, vec_mod): Add
builtin descriptions.

gcc/testsuite/
* gcc.target/powerpc/builtins-1-p10-runnable.c: New test file.
---
 gcc/config/rs6000/altivec.h   |   4 +
 gcc/config/rs6000/altivec.md  |   2 -
 gcc/config/rs6000/rs6000-builtin.def  |  21 +
 gcc/config/rs6000/rs6000-call.c   |  53 +++
 gcc/config/rs6000/rs6000.md   |   3 +-
 gcc/config/rs6000/vsx.md  | 211 +++---
 gcc/doc/extend.texi   | 120 ++
 .../powerpc/builtins-1-p10-runnable.c | 398 ++
 8 files changed, 759 insertions(+), 53 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c

diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index 06f0d4d9f14..961621a0841 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -750,6 +750,10 @@ __altivec_scalar_pred(vec_any_nle,
 #define vec_strir_p(a) __builtin_vec_strir_p (a)
 #define vec_stril_p(a) __builtin_vec_stril_p (a)
 
+#define vec_mulh(a, b) __builtin_vec_mulh ((a), (b))
+#define vec_dive(a, b) __builtin_vec_dive ((a), (b))
+#define vec_mod(a, b) __builtin_vec_mod ((a), (b))
+
 /* VSX

Re: [PATCH v4] rs6000, vector integer multiply/divide/modulo instructions

2021-01-04 Thread Carl Love via Gcc-patches

Segher, Will:

Just wanted to ping you both on this patch.  It has been out there for
awhile.

  Carl

On Mon, 2020-12-07 at 16:31 -0800, Carl Love wrote:
> Will:
> 
> I have addressed you comments with regards to the Change Log
> entries.  
> 
> The extra define vec_div was removed.
> 
> Added the missing entries for DIVU_V2DI  DIVS_V2DI in rs6000-call.c.
> 
> The extra MULLD_V2DI case statement entry was removed.
> 
> Added comment in rs6000.md about size for vector types per discussion
> with Pat.
> 
>   Carl
> 
> 
> GCC maintainers:
> 
> The following patch adds new builtins for the vector integer
> multiply,
> divide and modulo operations.  The builtins are: vec_mulh(),
> vec_dive(), vec_mod() for signed and unsigned integers and long
> longintegers. The existing support for the vec_div()and vec_mul()
> builtins emulate the vector operations with multiple scalar
> instructions.  This patch adds support for these builtins using the
> new
> vector instructions for Power 10.
> 
> The patch was compiled and tested on:
> 
>   powerpc64le-unknown-linux-gnu (Power 9 LE)
>   powerpc64le-unknown-linux-gnu (Power 10 LE)
> 
> with no regressions. Additionally the new test case was compiled and
> executed by hand on Mambo to verify the test case passes.
> 
> Please let me know if this patch is acceptable for mainline.  Thanks.
> 
> Carl Love
> 
> -
> 
> From 15f9c090106c62af83cc405414466ad03d1a4c55 Mon Sep 17 00:00:00
> 2001
> From: Carl Love 
> Date: Fri, 4 Sep 2020 19:24:22 -0500
> Subject: [PATCH] rs6000, vector integer multiply/divide/modulo
> instructions
> 
> 2020-12-07  Carl Love  
> 
> gcc/
>   * config/rs6000/altivec.h (vec_mulh, vec_dive, vec_mod): New
> defines.
>   * config/rs6000/altivec.md (VIlong): Move define to file
> vsx.md.
>   * config/rs6000/rs6000-builtin.def (DIVES_V4SI, DIVES_V2DI,
>   DIVEU_V4SI, DIVEU_V2DI, DIVS_V4SI, DIVS_V2DI, DIVU_V4SI,
>   DIVU_V2DI, MODS_V2DI, MODS_V4SI, MODU_V2DI, MODU_V4SI,
>   MULHS_V2DI, MULHS_V4SI, MULHU_V2DI, MULHU_V4SI, MULLD_V2DI):
>   Add builtin define.
>   (MULH, DIVE, MOD):  Add new BU_P10_OVERLOAD_2 definitions.
>   * config/rs6000/rs6000-call.c (altivec_overloaded_builtins):
> Add
>   VSX_BUILTIN_VEC_DIV, P10_BUILTIN_VEC_VDIVE,
>   P10_BUILTIN_VEC_VDIVE, P10_BUILTIN_VEC_VMOD,
> P10_BUILTIN_VEC_VMULH
>   overloaded definitions.
>   (builtin_function_type) [P10V_BUILTIN_DIVEU_V4SI,
>   P10V_BUILTIN_DIVEU_V2DI, P10V_BUILTIN_DIVU_V4SI,
>   P10V_BUILTIN_DIVU_V2DI, P10V_BUILTIN_MODU_V2DI,
>   P10V_BUILTIN_MODU_V4SI, P10V_BUILTIN_MULHU_V2DI,
>   P10V_BUILTIN_MULHU_V4SI, P10V_BUILTIN_MULLD_V2DI]: Add case
>   statements for builtins.
>   * config/rs6000/rs6000.md (bits): Add new attribute sizes.
>   * config/rs6000/vsx.md (VIlong): New define_mode_iterator.
>   (UNSPEC_VDIVES, UNSPEC_VDIVEU): New unspec definitions.
>   (vsx_mul_v2di): Add if TARGET_POWER10 statement.
>   (vsx_udiv_v2di): Add if TARGET_POWER10 statement.
>   (dives_, diveu_, div3, uvdiv3,
>   mods_, modu_, mulhs_, mulhu_,
> mulv2di3):
>   Add define_insn, mode is VIlong.
>   doc/extend.texi (vec_mulh, vec_mul, vec_div, vec_dive,
> vec_mod): Add
>   builtin descriptions.
> 
> gcc/testsuite/
>   * gcc.target/powerpc/builtins-1-p10-runnable.c: New test file.
> ---
>  gcc/config/rs6000/altivec.h   |   4 +
>  gcc/config/rs6000/altivec.md  |   2 -
>  gcc/config/rs6000/rs6000-builtin.def  |  22 +
>  gcc/config/rs6000/rs6000-call.c   |  53 +++
>  gcc/config/rs6000/rs6000.md   |   4 +-
>  gcc/config/rs6000/vsx.md  | 212 +++---
>  gcc/doc/extend.texi   | 120 ++
>  .../powerpc/builtins-1-p10-runnable.c | 398
> ++
>  8 files changed, 762 insertions(+), 53 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/builtins-1-p10-
> runnable.c
> 
> diff --git a/gcc/config/rs6000/altivec.h
> b/gcc/config/rs6000/altivec.h
> index e1884f51bd8..b678e5cf28d 100644
> --- a/gcc/config/rs6000/altivec.h
> +++ b/gcc/config/rs6000/altivec.h
> @@ -750,6 +750,10 @@ __altivec_scalar_pred(vec_any_nle,
>  #define vec_strir_p(a)   __builtin_vec_strir_p (a)
>  #define vec_stril_p(a)   __builtin_vec_stril_p (a)
>  
> +#define vec_mulh(a, b) __builtin_vec_mulh ((a), (b))
> +#define vec_dive(a, b) __builtin_vec_dive ((a), (b))
> +#define vec_mod(a, b) __builtin_vec_mod ((a), (b))
> +
>  /* VSX Mask Manipulation builtin. */
>  #define vec_genbm __builtin_vec_mtvsrbm
>  #define vec_genhm __builtin_vec_mtvsrhm
> diff --git a/gcc/config/rs6000/altivec.md
> b/gcc/config/rs6000/altivec.md
> index 6a6ce0f84ed..f10f1cdd8a7 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md

[PATCH v4] rs6000, vector integer multiply/divide/modulo instructions

2020-12-07 Thread Carl Love via Gcc-patches

Will:

I have addressed you comments with regards to the Change Log entries.  

The extra define vec_div was removed.

Added the missing entries for DIVU_V2DI  DIVS_V2DI in rs6000-call.c.

The extra MULLD_V2DI case statement entry was removed.

Added comment in rs6000.md about size for vector types per discussion
with Pat.

  Carl


GCC maintainers:

The following patch adds new builtins for the vector integer multiply,
divide and modulo operations.  The builtins are: vec_mulh(),
vec_dive(), vec_mod() for signed and unsigned integers and long
longintegers. The existing support for the vec_div()and vec_mul()
builtins emulate the vector operations with multiple scalar
instructions.  This patch adds support for these builtins using the new
vector instructions for Power 10.

The patch was compiled and tested on:

  powerpc64le-unknown-linux-gnu (Power 9 LE)
  powerpc64le-unknown-linux-gnu (Power 10 LE)

with no regressions. Additionally the new test case was compiled and
executed by hand on Mambo to verify the test case passes.

Please let me know if this patch is acceptable for mainline.  Thanks.

Carl Love

-

>From 15f9c090106c62af83cc405414466ad03d1a4c55 Mon Sep 17 00:00:00 2001
From: Carl Love 
Date: Fri, 4 Sep 2020 19:24:22 -0500
Subject: [PATCH] rs6000, vector integer multiply/divide/modulo instructions

2020-12-07  Carl Love  

gcc/
* config/rs6000/altivec.h (vec_mulh, vec_dive, vec_mod): Newdefines.
* config/rs6000/altivec.md (VIlong): Move define to file vsx.md.
* config/rs6000/rs6000-builtin.def (DIVES_V4SI, DIVES_V2DI,
DIVEU_V4SI, DIVEU_V2DI, DIVS_V4SI, DIVS_V2DI, DIVU_V4SI,
DIVU_V2DI, MODS_V2DI, MODS_V4SI, MODU_V2DI, MODU_V4SI,
MULHS_V2DI, MULHS_V4SI, MULHU_V2DI, MULHU_V4SI, MULLD_V2DI):
Add builtin define.
(MULH, DIVE, MOD):  Add new BU_P10_OVERLOAD_2 definitions.
* config/rs6000/rs6000-call.c (altivec_overloaded_builtins): Add
VSX_BUILTIN_VEC_DIV, P10_BUILTIN_VEC_VDIVE,
P10_BUILTIN_VEC_VDIVE, P10_BUILTIN_VEC_VMOD, P10_BUILTIN_VEC_VMULH
overloaded definitions.
(builtin_function_type) [P10V_BUILTIN_DIVEU_V4SI,
P10V_BUILTIN_DIVEU_V2DI, P10V_BUILTIN_DIVU_V4SI,
P10V_BUILTIN_DIVU_V2DI, P10V_BUILTIN_MODU_V2DI,
P10V_BUILTIN_MODU_V4SI, P10V_BUILTIN_MULHU_V2DI,
P10V_BUILTIN_MULHU_V4SI, P10V_BUILTIN_MULLD_V2DI]: Add case
statements for builtins.
* config/rs6000/rs6000.md (bits): Add new attribute sizes.
* config/rs6000/vsx.md (VIlong): New define_mode_iterator.
(UNSPEC_VDIVES, UNSPEC_VDIVEU): New unspec definitions.
(vsx_mul_v2di): Add if TARGET_POWER10 statement.
(vsx_udiv_v2di): Add if TARGET_POWER10 statement.
(dives_, diveu_, div3, uvdiv3,
mods_, modu_, mulhs_, mulhu_, mulv2di3):
Add define_insn, mode is VIlong.
doc/extend.texi (vec_mulh, vec_mul, vec_div, vec_dive, vec_mod): Add
builtin descriptions.

gcc/testsuite/
* gcc.target/powerpc/builtins-1-p10-runnable.c: New test file.
---
 gcc/config/rs6000/altivec.h   |   4 +
 gcc/config/rs6000/altivec.md  |   2 -
 gcc/config/rs6000/rs6000-builtin.def  |  22 +
 gcc/config/rs6000/rs6000-call.c   |  53 +++
 gcc/config/rs6000/rs6000.md   |   4 +-
 gcc/config/rs6000/vsx.md  | 212 +++---
 gcc/doc/extend.texi   | 120 ++
 .../powerpc/builtins-1-p10-runnable.c | 398 ++
 8 files changed, 762 insertions(+), 53 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c

diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index e1884f51bd8..b678e5cf28d 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -750,6 +750,10 @@ __altivec_scalar_pred(vec_any_nle,
 #define vec_strir_p(a) __builtin_vec_strir_p (a)
 #define vec_stril_p(a) __builtin_vec_stril_p (a)
 
+#define vec_mulh(a, b) __builtin_vec_mulh ((a), (b))
+#define vec_dive(a, b) __builtin_vec_dive ((a), (b))
+#define vec_mod(a, b) __builtin_vec_mod ((a), (b))
+
 /* VSX Mask Manipulation builtin. */
 #define vec_genbm __builtin_vec_mtvsrbm
 #define vec_genhm __builtin_vec_mtvsrhm
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 6a6ce0f84ed..f10f1cdd8a7 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -193,8 +193,6 @@
 
 ;; Short vec int modes
 (define_mode_iterator VIshort [V8HI V16QI])
-;; Longer vec int modes for rotate/mask ops
-(define_mode_iterator VIlong [V2DI V4SI])
 ;; Vec float modes
 (define_mode_iterator VF [V4SF])
 ;; Vec modes, pity mode iterators are not composable
diff --git a/gcc/config/rs6000/rs6000-builtin.def

Re: [PATCH v2] rs6000, vector integer multiply/divide/modulo instructions

2020-12-01 Thread Carl Love via Gcc-patches

Segher, Pat:

I have updated the patch to address the comments below.

On Wed, 2020-11-25 at 20:30 -0600, Segher Boessenkool wrote:
> On Tue, Nov 24, 2020 at 08:34:51PM -0600, Pat Haugen wrote:
> > On 11/24/20 8:17 PM, Pat Haugen via Gcc-patches wrote:
> > > On 11/24/20 12:59 PM, Carl Love via Gcc-patches wrote:
> > > > +(define_insn "modu_"
> > > > +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> > > > +   (umod:VIlong (match_operand:VIlong 1
> > > > "vsx_register_operand" "v")
> > > > +(match_operand:VIlong 2
> > > > "vsx_register_operand" "v")))]
> > > > +  "TARGET_POWER10"
> > > > +  "vmodu %0,%1,%2"
> > > > +  [(set_attr "type" "vecdiv")
> > > > +   (set_attr "size" "128")])
> > > 
> > > We should only be setting "size" "128" for instructions that
> > > operate on scalar 128-bit data items (i.e. 'vdivesq' etc). Since
> > > the above insns are either V2DI/V4SI (ala VIlong mode_iterator),
> > > they shouldn't be marked as size 128. If you want to set the size
> > > based on mode, (set_attr "size" "") should do the trick I
> > > believe.
> > 
> > Well, after you update "(define_mode_attr bits" in rs6000.md for
> > V2DI/V4SI.
> 
> So far,  was only used for scalars.  I agree that for vectors
> it
> makes most sense to do the element size (because the vector size
> always
> is 128 bits, and for scheduling the element size can matter).  But,
> the
> definitions of  and  now say
> 
> ;; What data size does this instruction work on?
> ;; This is used for insert, mul and others as necessary.
> (define_attr "size" "8,16,32,64,128" (const_string "32"))
> 
> and
> 
> ;; How many bits in this mode?
> (define_mode_attr bits [(QI "8") (HI "16") (SI "32") (DI "64")
>(SF "32") (DF "64")])
> so those need a bit of update as well then :-)

I set the size based on the vector element size, extendeing the
define_mode_attr bits definition.  Please take a look at the updated
patch.  Hopefully I have this all correct.  Thanks.

Note, I retested the updated patch on 

  powerpc64le-unknown-linux-gnu (Power 9 LE)
  powerpc64le-unknown-linux-gnu (Power 10 LE)

Thanks for the help.

 Carl 

---
rs6000, vector integer multiply/divide/modulo instructions

2020-12-01  Carl Love  

gcc/
* config/rs6000/altivec.h (vec_mulh, vec_div, vec_dive, vec_mod): New
defines.
* config/rs6000/altivec.md (VIlong): Move define to file vsx.md.
* config/rs6000/rs6000-builtin.def (DIVES_V4SI, DIVES_V2DI,
DIVEU_V4SI, DIVEU_V2DI, DIVS_V4SI, DIVS_V2DI, DIVU_V4SI,
DIVU_V2DI, MODS_V2DI, MODS_V4SI, MODU_V2DI, MODU_V4SI,
MULHS_V2DI, MULHS_V4SI, MULHU_V2DI, MULHU_V4SI, MULLD_V2DI):
Add builtin define.
(MULH, DIVE, MOD):  Add new BU_P10_OVERLOAD_2 definitions.
* config/rs6000/rs6000-call.c (VSX_BUILTIN_VEC_DIV,
P10_BUILTIN_VEC_VDIVE, P10_BUILTIN_VEC_VMOD, P10_BUILTIN_VEC_VMULH):
New overloaded definitions.
(builtin_function_type) [P10V_BUILTIN_DIVEU_V4SI,
P10V_BUILTIN_DIVEU_V2DI, P10V_BUILTIN_DIVU_V4SI,
P10V_BUILTIN_DIVU_V2DI, P10V_BUILTIN_MODU_V2DI,
P10V_BUILTIN_MODU_V4SI, P10V_BUILTIN_MULHU_V2DI,
P10V_BUILTIN_MULHU_V4SI, P10V_BUILTIN_MULLD_V2DI]: Add case
statement for builtins.
* config/rs6000/vsx.md (VIlong_char): Add define_mod_attribute.
(UNSPEC_VDIVES, UNSPEC_VDIVEU): Add enum for UNSPECs.
(vsx_mul_v2di, vsx_udiv_v2di): Add if TARGET_POWER10 statement.
(dives_, diveu_, div3, uvdiv3,
mods_, modu_, mulhs_, mulhu_, mulv2di3):
Add define_insn, mode is VIlong.
* doc/extend.texi (vec_mulh, vec_mul, vec_div, vec_dive, vec_mod): Add
builtin descriptions.

gcc/testsuite/
* gcc.target/powerpc/builtins-1-p10-runnable.c: New test file.
---
 gcc/config/rs6000/altivec.h   |   5 +
 gcc/config/rs6000/altivec.md  |   2 -
 gcc/config/rs6000/rs6000-builtin.def  |  22 +
 gcc/config/rs6000/rs6000-call.c   |  49 +++
 gcc/config/rs6000/rs6000.md   |   3 +-
 gcc/config/rs6000/vsx.md  | 213 +++---
 gcc/doc/extend.texi   | 120 ++
 .../powerpc/builtins-1-p10-runnable.c | 398 ++
 8 files changed, 759 insertions(+), 53 delet

[PATCH v2] rs6000, vector integer multiply/divide/modulo instructions

2020-11-24 Thread Carl Love via Gcc-patches

Segher:

I have addressed the various issues you and Pat mentioned. 
Specifically:

  - Added parenthesis around the macro arguments in altivec.h.
  - Removed VIlong_char, using  instead.
  - Reimplemented define_insn "vmulhs_" and define_insn 
"vmulhs_" to not use an UNSPEC.  Also changed the type to
veccomplex.
  - Fixed the header and index value "i" in documentation file  
extend.texi as requested.
  - Changed attribute type from vecsimple to vecdiv in dives_,
diveu_, div3, udiv3, mods_, modu_.
  - Added the size attribute of 128 to mods_ and modu_.
  - Changed the set attribute to veccomplex for define_insn "mulv2di3.
  - Added "signed" or "unsigned" to the error print statements in the
test program to clarify what case had failed.

In addition to testing on Power 9, I was able to test the updated patch
on Power 10.

Please let me know if the above changes are all acceptable and if there
are any additional changes needed.  Thanks.

Carl 


-
GCC maintainers:

The following patch adds new builtins for the vector integer multiply,
divide and modulo operations.  The builtins are:  
vec_mulh(), vec_div(), vec_dive(), vec_mod() for signed and unsigned
integers and long long integers.  Support for signed and unsigned long
long integers the exiting vec_mul() is added.  Note that the existing
support for the vec_div()and vec_mul() builtins emulate the vector
operations with multiple scalar instructions.  This patch adds support
for these builtins to use the new vector instructions.

The patch was compiled and tested on:

  powerpc64le-unknown-linux-gnu (Power 9 LE) 
  powerpc64le-unknown-linux-gnu (Power 10 LE)
  
with no regressions. Additionally the new test case was compiled and
executed by hand on Mambo to verify the test case passes.

Please let me know if this patch is acceptable for mainline.  Thanks.

Carl Love



---


2020-11-23  Carl Love  

gcc/
* config/rs6000/altivec.h (vec_mulh, vec_div, vec_dive, vec_mod): New
defines.
* config/rs6000/altivec.md (VIlong): Move define to file vsx.md.
* config/rs6000/rs6000-builtin.def (DIVES_V4SI, DIVES_V2DI,
DIVEU_V4SI, DIVEU_V2DI, DIVS_V4SI, DIVS_V2DI, DIVU_V4SI,
DIVU_V2DI, MODS_V2DI, MODS_V4SI, MODU_V2DI, MODU_V4SI,
MULHS_V2DI, MULHS_V4SI, MULHU_V2DI, MULHU_V4SI, MULLD_V2DI):
Add builtin define.
(MULH, DIVE, MOD):  Add new BU_P10_OVERLOAD_2 definitions.
* config/rs6000/rs6000-call.c (VSX_BUILTIN_VEC_DIV,
P10_BUILTIN_VEC_VDIVE, P10_BUILTIN_VEC_VMOD, P10_BUILTIN_VEC_VMULH):
New overloaded definitions.
(builtin_function_type) [P10V_BUILTIN_DIVEU_V4SI,
P10V_BUILTIN_DIVEU_V2DI, P10V_BUILTIN_DIVU_V4SI,
P10V_BUILTIN_DIVU_V2DI, P10V_BUILTIN_MODU_V2DI,
P10V_BUILTIN_MODU_V4SI, P10V_BUILTIN_MULHU_V2DI,
P10V_BUILTIN_MULHU_V4SI, P10V_BUILTIN_MULLD_V2DI]: Add case
statement for builtins.
* config/rs6000/vsx.md (VIlong_char): Add define_mod_attribute.
(UNSPEC_VDIVES, UNSPEC_VDIVEU): Add enum for UNSPECs.
(vsx_mul_v2di, vsx_udiv_v2di): Add if TARGET_POWER10 statement.
(dives_, diveu_, div3, uvdiv3,
mods_, modu_, mulhs_, mulhu_, mulv2di3):
Add define_insn, mode is VIlong.
* doc/extend.texi (vec_mulh, vec_mul, vec_div, vec_dive, vec_mod): Add
builtin descriptions.

gcc/testsuite/
* gcc.target/powerpc/builtins-1-p10-runnable.c: New test file.
---
 gcc/config/rs6000/altivec.h   |   5 +
 gcc/config/rs6000/altivec.md  |   2 -
 gcc/config/rs6000/rs6000-builtin.def  |  22 +
 gcc/config/rs6000/rs6000-call.c   |  49 +++
 gcc/config/rs6000/vsx.md  | 213 +++---
 gcc/doc/extend.texi   | 120 ++
 .../powerpc/builtins-1-p10-runnable.c | 398 ++
 7 files changed, 757 insertions(+), 52 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c

diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index e1884f51bd8..12ccbd2fc2f 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -750,6 +750,11 @@ __altivec_scalar_pred(vec_any_nle,
 #define vec_strir_p(a) __builtin_vec_strir_p (a)
 #define vec_stril_p(a) __builtin_vec_stril_p (a)
 
+#define vec_mulh(a, b) __builtin_vec_mulh ((a), (b))
+#define vec_div(a, b) __builtin_vec_div ((a), (b))
+#define vec_dive(a, b) __builtin_vec_dive ((a), (b))
+#define vec_mod(a, b) __builtin_vec_mod ((a), (b))
+
 /* VSX Mask Manipulation builtin. */
 #define vec_genbm __builtin_vec_mtvsrbm
 #define vec_genhm __builtin_vec_mtvsrhm
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 6a6ce0f84ed..f10f1cdd8a7 100644
--- a/gcc/config/rs6000/altivec.md
+++

RE: [PATCH] rs6000, vector integer multiply/divide/modulo instructions

2020-11-04 Thread Carl Love via Gcc-patches

David:

I have reworked the patch moving the new vector instruction patterns to
vsx.md.  Also, cleaned up the vector division instructions.  The
div3 pattern definitions are the only ones that should be
defined.  

I have retested the patch on:

   powerpc64le-unknown-linux-gnu (Power 9 LE)

with no regressions. Additionally the new test case was compiled and
executed by hand on Mambo to verify the test case passes.

Please let me know if this patch is acceptable for mainline.  Thanks.

Carl Love

--

2020-11-02  Carl Love  

gcc/
* config/rs6000/altivec.h (vec_mulh, vec_div, vec_dive, vec_mod): New
defines.
* config/rs6000/altivec.md (VIlong): Move define to file vsx.md.
* config/rs6000/rs6000-builtin.def (VDIVES_V4SI, VDIVES_V2DI,
VDIVEU_V4SI, VDIVEU_V2DI, VDIVS_V4SI, VDIVS_V2DI, VDIVU_V4SI,
VDIVU_V2DI, VMODS_V2DI, VMODS_V4SI, VMODU_V2DI, VMODU_V4SI,
VMULHS_V2DI, VMULHS_V4SI, VMULHU_V2DI, VMULHU_V4SI, VMULLD_V2DI):
Add builtin define.
(VMUL, VMULH, VDIVE, VMOD):  Add new BU_P10_OVERLOAD_2 definitions.
* config/rs6000/rs6000-call.c (VSX_BUILTIN_VEC_DIV,
P10_BUILTIN_VEC_VDIVE, P10_BUILTIN_VEC_VMOD, P10_BUILTIN_VEC_VMULH):
New overloaded definitions.
(builtin_function_type) [P10V_BUILTIN_VDIVEU_V4SI,
P10V_BUILTIN_VDIVEU_V2DI, P10V_BUILTIN_VDIVU_V4SI,
P10V_BUILTIN_VDIVU_V2DI, P10V_BUILTIN_VMODU_V2DI,
P10V_BUILTIN_VMODU_V4SI, P10V_BUILTIN_VMULHU_V2DI,
P10V_BUILTIN_VMULHU_V4SI, P10V_BUILTIN_VMULLD_V2DI]: Add case
statement for builtins.
* config/rs6000/vsx.md (VIlong_char): Add define_mod_attribute.
(UNSPEC_VDIVES, UNSPEC_VDIVEU,
UNSPEC_VMULHS, UNSPEC_VMULHU, UNSPEC_VMULLD): Add enum for UNSPECs.
(vsx_mul_v2di, vsx_udiv_v2di): Add if TARGET_POWER10 statement.
(vdives_, vdiveu_, vdiv3, uuvdiv3,
vmods_, vmodu_, vmulhs_, vmulhu_, mulv2di3):
Add define_insn, mode is VIlong.
* doc/extend.texi (vec_mulh, vec_mul, vec_div, vec_dive, vec_mod): Add
builtin descriptions.

gcc/testsuite/
* gcc.target/powerpc/builtins-1-p10-runnable.c: New test file.
---
 gcc/config/rs6000/altivec.h   |   5 +
 gcc/config/rs6000/altivec.md  |   2 -
 gcc/config/rs6000/rs6000-builtin.def  |  23 ++
 gcc/config/rs6000/rs6000-call.c   |  49 +++
 gcc/config/rs6000/vsx.md  | 205 +++---
 gcc/doc/extend.texi   | 120 ++
 .../powerpc/builtins-1-p10-runnable.c | 378 ++
 7 files changed, 730 insertions(+), 52 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/builtins-1-p10-runnable.c

diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index e1884f51bd8..d8f1d2cfc55 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -750,6 +750,11 @@ __altivec_scalar_pred(vec_any_nle,
 #define vec_strir_p(a) __builtin_vec_strir_p (a)
 #define vec_stril_p(a) __builtin_vec_stril_p (a)
 
+#define vec_mulh(a, b) __builtin_vec_mulh (a, b)
+#define vec_div(a, b) __builtin_vec_div (a, b)
+#define vec_dive(a, b) __builtin_vec_dive (a, b)
+#define vec_mod(a, b) __builtin_vec_mod (a, b)
+
 /* VSX Mask Manipulation builtin. */
 #define vec_genbm __builtin_vec_mtvsrbm
 #define vec_genhm __builtin_vec_mtvsrhm
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 6a6ce0f84ed..f10f1cdd8a7 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -193,8 +193,6 @@
 
 ;; Short vec int modes
 (define_mode_iterator VIshort [V8HI V16QI])
-;; Longer vec int modes for rotate/mask ops
-(define_mode_iterator VIlong [V2DI V4SI])
 ;; Vec float modes
 (define_mode_iterator VF [V4SF])
 ;; Vec modes, pity mode iterators are not composable
diff --git a/gcc/config/rs6000/rs6000-builtin.def 
b/gcc/config/rs6000/rs6000-builtin.def
index a58102c3785..7663465b755 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -2877,6 +2877,24 @@ BU_P10V_AV_3 (VSRDB_V8HI, "vsrdb_v8hi", CONST, 
vsrdb_v8hi)
 BU_P10V_AV_3 (VSRDB_V4SI, "vsrdb_v4si", CONST, vsrdb_v4si)
 BU_P10V_AV_3 (VSRDB_V2DI, "vsrdb_v2di", CONST, vsrdb_v2di)
 
+BU_P10V_AV_2 (VDIVES_V4SI, "vdivesw", CONST, vdives_v4si)
+BU_P10V_AV_2 (VDIVES_V2DI, "vdivesd", CONST, vdives_v2di)
+BU_P10V_AV_2 (VDIVEU_V4SI, "vdiveuw", CONST, vdiveu_v4si)
+BU_P10V_AV_2 (VDIVEU_V2DI, "vdiveud", CONST, vdiveu_v2di)
+BU_P10V_AV_2 (VDIVS_V4SI, "vdivsw", CONST, divv4si3)
+BU_P10V_AV_2 (VDIVS_V2DI, "vdivsd", CONST, divv2di3)
+BU_P10V_AV_2 (VDIVU_V4SI, "vdivuw", CONST, udivv4si3)
+BU_P10V_AV_2 (VDIVU_V2DI, "vdivud", CONST, udivv2di3)
+BU_P10V_AV_2 (VMODS_V2DI, "vmodsd", CONST, vmods_v2di)
+BU_P10V_AV_2 (VMODS_V4SI, "vmodsw", CONST, vmods_v4si)
+BU_P10V_AV_2 (VMODU_V2DI, "vmodud", CONST, vmodu_v2di)

RE: [PATCH] rs6000, vector integer multiply/divide/modulo instructions

2020-11-02 Thread Carl Love via Gcc-patches



David:
> 
> Hi, Carl
> 
> I thought that vector.md was a transfer vector for the patterns and
> instructions were defined in vsx.md.  Why are the new insn patterns
> defined in vector.md?

I am a bit of a newbie here.  I wasn't aware of any specific guide
lines on the vector instructions.  I put them in vector.md since they
are vector instructions.  Made sense to me.  I can move them to vsx.md
if that is the prefered place, no problem.
> 
> > +(define_insn "div3"
> > +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> > +   (div:VIlong (match_operand:VIlong 1 "vsx_register_operand"
> > "v")
> > +   (match_operand:VIlong 2 "vsx_register_operand"
> > "v")))]
> > +  "TARGET_POWER10"
> > +  "vdivs %0,%1,%2"
> > +  [(set_attr "type" "vecsimple")])
> > +
> > +(define_insn "udiv3"
> > +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> > +   (udiv:VIlong (match_operand:VIlong 1 "vsx_register_operand"
> > "v")
> > +   (match_operand:VIlong 2 "vsx_register_operand"
> > "v")))]
> > +  "TARGET_POWER10"
> > +  "vdivu %0,%1,%2"
> > +  [(set_attr "type" "vecsimple")])
> > +
> > +(define_insn "vdivs_"
> > +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> > +   (div:VIlong (match_operand:VIlong 1 "vsx_register_operand"
> > "v")
> > +   (match_operand:VIlong 2 "vsx_register_operand"
> > "v")))]
> > +  "TARGET_POWER10"
> > +  "vdivs %0,%1,%2"
> > +  [(set_attr "type" "vecsimple")])
> > +
> > +(define_insn "vdivu_"
> > +  [(set (match_operand:VIlong 0 "vsx_register_operand" "=v")
> > +   (udiv:VIlong (match_operand:VIlong 1 "vsx_register_operand"
> > "v")
> > +(match_operand:VIlong 2 "vsx_register_operand"
> > "v")))]
> > +  "TARGET_POWER10"
> > +  "vdivu %0,%1,%2"
> > +  [(set_attr "type" "vecsimple")])
> 
> Also, what is the reason to define div3 and udiv3, then
> repeat the patterns for vdivs_ and vdivu_?  Is there a
> difference between the two patterns that I'm missing?  The new
> builtins should be able to invoke the new named standard
> patterns.  Or
> we really want an additional set of patterns that match the builtin
> names?
> 
> The div3 and udiv3 patterns do not seem to be listed in
> the ChangeLog.

I originally added the vector multiply and divide instructions as
vmult_, vdivs_, etc.  I couldn't get GCC to generate the
instructions.  Bill pointed out that I hadn't used the default names
div3.  I thought I changed the original mult and div names to the
default names.  Looks like I didn't get the div stuff all updated in
the patch.  So, yea there should just be the div3 and udiv3
definitions.  My bad, sorry.  

I will update the patch, retest and repost.  Thanks for the input.

  Carl

1 2 >

1 - 100 of 198 matches

Mail list logo