Thanks Richard for comments.

> && CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (def))

Ack.

> I think you need to verify the intermediate precision (that of 'val')
> is also >= rhs_prec, otherwise we risk to lose the effect of a truncation.

I see, will update it.

> OK with such check added.

Sure thing, will commit it with those changes if no surprise from test.

Pan


-----Original Message-----
From: Richard Biener <richard.guent...@gmail.com> 
Sent: Monday, September 15, 2025 7:23 PM
To: Li, Pan2 <pan2...@intel.com>
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; 
jeffreya...@gmail.com; rdapp....@gmail.com; Chen, Ken <ken.c...@intel.com>; 
Liu, Hongtao <hongtao....@intel.com>
Subject: Re: [PATCH v1 1/2] Widening-Mul: Refine build_and_insert_cast when rhs 
is cast

On Tue, Sep 9, 2025 at 3:31 AM <pan2...@intel.com> wrote:
>
> From: Pan Li <pan2...@intel.com>
>
> The widening-mul will insert a cast for the widen-mul, the
> function build_and_insert_cast is design to take care of it.
>
> In some case the optimized gimple has some unnecessary cast,
> for example as below code.
>
>   #define SAT_U_MUL_FMT_5(NT, WT)                 \
>   NT __attribute__((noinline))                    \
>   sat_u_mul_##NT##_from_##WT##_fmt_5 (NT a, NT b) \
>   {                                               \
>     WT x = (WT)a * (WT)b;                         \
>     NT hi = x >> (sizeof(NT) * 8);                \
>     NT lo = (NT)x;                                \
>     return lo | -!!hi;                            \
>   }
>
>   SAT_U_MUL_FMT_5(uint64_t, uint128_t)
>
> There will be a additional cast to uint128_t after optimized,
> this patch would like to refine this by checking the def of
> the rhs cast, if it comes from a cast with less or equal
> precision, the rhs of the def will be leveraged.
>
> Before this patch:
>   29   │   _1 = (__int128 unsigned) a_8(D);
>   30   │   _2 = (__int128 unsigned) b_9(D);
>   31   │   _35 = (unsigned long) _1;
>   32   │   _34 = (unsigned long) _2;
>   33   │   x_10 = _35 w* _34;
>
> After this patch:
>   27   │   _35 = (unsigned long) a_8(D);
>   28   │   _34 = (unsigned long) b_9(D);
>   29   │   x_10 = _35 w* _34;
>
> gcc/ChangeLog:
>
>         * tree-ssa-math-opts.cc (build_and_insert_cast): Refine
>         the cast insert by check the rhs of val.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/riscv/sat/widen-mul-0.c: New test.
>         * gcc.target/riscv/sat/widen-mul-1.c: New test.
>         * gcc.target/riscv/sat/widen-mul-2.c: New test.
>         * gcc.target/riscv/sat/widen-mul-3.c: New test.
>         * gcc.target/riscv/sat/widen-mul.h: New test.
>
> Signed-off-by: Pan Li <pan2...@intel.com>
> ---
>  .../gcc.target/riscv/sat/widen-mul-0.c        |  8 +++++++
>  .../gcc.target/riscv/sat/widen-mul-1.c        |  8 +++++++
>  .../gcc.target/riscv/sat/widen-mul-2.c        |  8 +++++++
>  .../gcc.target/riscv/sat/widen-mul-3.c        |  8 +++++++
>  .../gcc.target/riscv/sat/widen-mul.h          | 15 +++++++++++++
>  gcc/tree-ssa-math-opts.cc                     | 21 ++++++++++++++++++-
>  6 files changed, 67 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/sat/widen-mul-0.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/sat/widen-mul-1.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/sat/widen-mul-2.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/sat/widen-mul-3.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/sat/widen-mul.h
>
> diff --git a/gcc/testsuite/gcc.target/riscv/sat/widen-mul-0.c 
> b/gcc/testsuite/gcc.target/riscv/sat/widen-mul-0.c
> new file mode 100644
> index 00000000000..1074fa9d5f4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/sat/widen-mul-0.c
> @@ -0,0 +1,8 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gc -mabi=lp64d -fdump-tree-optimized" } */
> +
> +#include "widen-mul.h"
> +
> +SAT_U_MUL_FMT_5(uint8_t, uint128_t)
> +
> +/* { dg-final { scan-tree-dump-not " = (__int128 unsigned) " "optimized" } } 
> */
> diff --git a/gcc/testsuite/gcc.target/riscv/sat/widen-mul-1.c 
> b/gcc/testsuite/gcc.target/riscv/sat/widen-mul-1.c
> new file mode 100644
> index 00000000000..5f8f0dd733c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/sat/widen-mul-1.c
> @@ -0,0 +1,8 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gc -mabi=lp64d -fdump-tree-optimized" } */
> +
> +#include "widen-mul.h"
> +
> +SAT_U_MUL_FMT_5(uint16_t, uint128_t)
> +
> +/* { dg-final { scan-tree-dump-not " = (__int128 unsigned) " "optimized" } } 
> */
> diff --git a/gcc/testsuite/gcc.target/riscv/sat/widen-mul-2.c 
> b/gcc/testsuite/gcc.target/riscv/sat/widen-mul-2.c
> new file mode 100644
> index 00000000000..4c54cc07b53
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/sat/widen-mul-2.c
> @@ -0,0 +1,8 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gc -mabi=lp64d -fdump-tree-optimized" } */
> +
> +#include "widen-mul.h"
> +
> +SAT_U_MUL_FMT_5(uint32_t, uint128_t)
> +
> +/* { dg-final { scan-tree-dump-not " = (__int128 unsigned) " "optimized" } } 
> */
> diff --git a/gcc/testsuite/gcc.target/riscv/sat/widen-mul-3.c 
> b/gcc/testsuite/gcc.target/riscv/sat/widen-mul-3.c
> new file mode 100644
> index 00000000000..d3dd97104b7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/sat/widen-mul-3.c
> @@ -0,0 +1,8 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gc -mabi=lp64d -fdump-tree-optimized" } */
> +
> +#include "widen-mul.h"
> +
> +SAT_U_MUL_FMT_5(uint64_t, uint128_t)
> +
> +/* { dg-final { scan-tree-dump-not " = (__int128 unsigned) " "optimized" } } 
> */
> diff --git a/gcc/testsuite/gcc.target/riscv/sat/widen-mul.h 
> b/gcc/testsuite/gcc.target/riscv/sat/widen-mul.h
> new file mode 100644
> index 00000000000..a0a44084e58
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/sat/widen-mul.h
> @@ -0,0 +1,15 @@
> +#include <stdint.h>
> +
> +#if __riscv_xlen == 64
> +typedef unsigned __int128 uint128_t;
> +#endif
> +
> +#define SAT_U_MUL_FMT_5(NT, WT)                 \
> +NT __attribute__((noinline))                    \
> +sat_u_mul_##NT##_from_##WT##_fmt_5 (NT a, NT b) \
> +{                                               \
> +  WT x = (WT)a * (WT)b;                         \
> +  NT hi = x >> (sizeof(NT) * 8);                \
> +  NT lo = (NT)x;                                \
> +  return lo | -!!hi;                            \
> +}
> diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
> index bfad4cf5c99..51b512771a7 100644
> --- a/gcc/tree-ssa-math-opts.cc
> +++ b/gcc/tree-ssa-math-opts.cc
> @@ -1632,7 +1632,26 @@ build_and_insert_cast (gimple_stmt_iterator *gsi, 
> location_t loc,
>                        tree type, tree val)
>  {
>    tree result = make_ssa_name (type);
> -  gassign *stmt = gimple_build_assign (result, NOP_EXPR, val);
> +  tree rhs = val;
> +
> +  if (TREE_CODE (val) == SSA_NAME)
> +    {
> +      gimple *def = SSA_NAME_DEF_STMT (val);
> +
> +      if (is_gimple_assign (def)
> +         && gimple_assign_rhs_code (def) == NOP_EXPR)

&& CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (def))

> +       {
> +         tree cast_rhs = gimple_assign_rhs1 (def);
> +         unsigned rhs_prec = TYPE_PRECISION (TREE_TYPE (cast_rhs));
> +         unsigned type_prec = TYPE_PRECISION (type);
> +
> +         if (type_prec >= rhs_prec)

I think you need to verify the intermediate precision (that of 'val')
is also >= rhs_prec, otherwise we risk to lose the effect of a truncation.

OK with such check added.

Thanks,
Richard.

> +           rhs = cast_rhs;
> +       }
> +    }
> +
> +  gassign *stmt = gimple_build_assign (result, NOP_EXPR, rhs);
> +
>    gimple_set_location (stmt, loc);
>    gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
>    return result;
> --
> 2.43.0
>

Reply via email to