Hi,

I wanted some insight/clarity on subreg promotion at expand time following a promote_function_mode() Apologies Roger and HJ for explicit CC but it seems you touched the same general area in 2021 and 2026 respectively.

Here's my understand of various pieces and the problem I'm running into.

1a. PROMOTE_MODE has no direct ABI implications on its own but could do so indirectly if TARGET_PROMOTE_FUNCTION_MODE uses the default always which in turn uses this macro.

1b. According to docs [1] PROMOTE_MODE for for ISAs supporting only 64-bit registers would define it to word_mode (64) and I'm implying further that for ISAs supporting both it should be OK to define either 32 or 64 although it might be desirable to have 32, just for codegen fiddling with fewer bits if nothing else.

   "On most RISC machines, which only have operations that operate on a
   full register,
   define this macro to set m to word_mode if m is an integer mode
   narrower than
   BITS_PER_WORD...."

The RISC-V implementation is textbook perfect: for rv64 it would promote anything smaller that DI to DI (since that's how wide the container itself is) and if SI clear the unsigned bit as well since most ALU operations would sign extend the 32-bit result to 64-bits.

BPF currently defines it to promote anything smaller that DI to DI: that might be a bit conservative and lead to fewer 32-bit only insns. A future/separate change to promote anything smaller than SI to SI can be done later and would not be wrong.

2a. Does setting SUBREG_PROMOTED_VAR_P imply that rest of pass pipeline assumes it is already promoted (thus potentially eliding any subsequent zero/sign extensions)  or does it ensure that such extensions will always be generated on any moves. The documentation [2] seems to suggest it is the latter, although usage in the code seems to be more like former.

   "Nonzero in a subreg if it was made when accessing an object that
   was promoted
   to a wider mode in accord with the PROMOTED_MODE machine description
   macro
   ... . In this case, the mode of the
   subreg is the declared mode of the object and the mode of SUBREG_REG
   is the
   mode of the register that holds the object. *Promoted variables are
   always either
   sign- or zero-extended to the wider mode on every assignment*.
   Stored in the
   in_struct field and printed as ‘/s’."

FWIW my patch [3] removed code which was clearing SUBREG_PROMOTED_VAR_P as it was leading to extraneous sign extensions on RISC-V.So I tend to think that keeping subreg promoted prevents subsequent generation of extensions,

   2023-10-16 8eb9cdd14218 expr: don't clear SUBREG_PROMOTED_VAR_P flag
   for a promoted subreg [target/111466]


2b. expand_call () depending on modes of @target and @rettype would call ABI promotion for return value, wrap target in a subreg with new mode and set SUBREG_PROMOTED_VAR_P eagerly.

   store_expr
   |- expand_call
           if  REG_P(target)  && GET_MODE (target) != TYPE_MODE (rettype))
   ...
              pmode = promote_function_mode (type, ret_mode,
   &unsignedp, funtype, 1);
              target = gen_lowpart_SUBREG (ret_mode, target);
              SUBREG_PROMOTED_VAR_P (target) = 1;
              SUBREG_PROMOTED_SET (target, unsignedp);

   The followig convert_move (@from as subreg) just strips off the subreg
   ...
   |-  convert_move
           if (GET_CODE (from) == SUBREG
               && SUBREG_PROMOTED_VAR_P (from)
   ...
               from = gen_lowpart (to_int_mode, SUBREG_REG (from));

This supposedly ensures that extensions won't be generated ? but between setting the subreg promoted and stripping the outer, an extension was not generated for return anyways, what am I missing ?

3. The reason for the questions above is PR/124171 [4] where we need to change gcc BPF function ABI to promote arguments as well as return values both in callee. I'm guessing the last part is atypical as args promoted in caller would imply return promoted in callee - but BPF code could be called from as well as calling into other ABIs, such as x86 kernel code and thus needs to ensure sanity in either direction.

For implementing this

 * I'm specifying TARGET_PROMOTE_FUNCTION_MODE to default
   promote_always: so both args and retval will be promoted.
 * Currently bpf PROMOTE_MODE defaults to promoting anything smaller
   than DI to DI (although ISA has insn to do SI mode only ops, and it
   could be changed to that effect later on, separately, but that is
   not really needed for the ABI change)

This work for most part, except for a single weird test which fails to promote a bool return value in caller.

   _Bool bar_bool(void);
   int foo_bool_ne1(void) {
          if (bar_bool() != 1) return 0; else return 1;
   }

On trunk this generates

   foo_bool_ne1:
        call    bar_bool
        r0 &= 0xff
        exit

With TARGET_PROMOTE_FUNCTION_MODE to default always (and unchanged PROMOTE_MODE: sub DI to DI), it generates

   foo_bool_ne1:
        call    bar_bool
        r0 = (s32) 0xff
        exit

The s32 truncation doesn't seem right as it needs to clamp it to 8 bits (ideally 1 but bool for most targets is implemented as QI).

gimple output is same for both trunk and patch

   int foo_bool_ne1 ()
   {
      _Bool _1;
      int _5;
      <bb 2> [local count: 1073741824]:
      _1 = bar_bool ();
      _5 = (int) _1;
      return _5;
   }


RTL Expansion is obviously different:

For trunk version we get a zero_extend

   (insn 7 6 8 2 (set (reg:QI 23)
            (reg:QI 22))
         (nil))
   (insn 8 7 9 2 (set (reg:DI 19 [ _1 ])
            (zero_extend:DI (reg:QI 23)))   <----
         (nil))
   (insn 9 8 10 2 (set (reg:SI 24 [ _5 ])
            (subreg/s/v:SI (reg:DI 19 [ _1 ]) 0))
         (nil))

The zero_extend happens because in store_expr () -> expand_call stack shown above, subreg is not created due to same QI mode for both @target and @rettype

      if (REG_P (target)
         && TYPE_MODE (rettype) != BLKmode
         && GET_MODE (target) != TYPE_MODE (rettype)) <--- false

And ensuing convert_move (to=DI, from=QI) generates a zero extend.

W/ patch, the check above is true as @target is DI, while @retyype is QI so so subreg is created. And subsequently  convert_move(to=DI, from=(subreg/s/v:QI (reg:DI 22) 0) gets called But it strips out the QI subreg, skips any extension ggeneration and finally ends with a slightly different subreg outer SI  (not QI) with inner DI

   (insn 7 6 8 2 (set (reg:DI 22)
            (reg:DI 21))
         (nil))
   (insn 8 7 9 2 (set (reg:DI 19 [ _1 ])
            (reg:DI 22))
         (nil))
   (insn 9 8 10 2 (set (reg:SI 23 [ _5 ])
            (subreg/s/v:SI (reg:DI 19 [ _1 ]) 0))
         (nil))

A subsequent sign_extension is generated for the return value - which is probably due to function promote mode being always, but IMO its not right (SI to DI. vs QI to DI) and is done for wrong reasons (outgoing function return value, not incoming call return value)

   (insn 10 9 11 2 (set (reg:DI 24)              <-- sign extension
            (ashift:DI (subreg:DI (reg:SI 23 [ _5 ]) 0)
                (const_int 32 [0x20])))
         (nil))
   (insn 11 10 12 2 (set (reg:DI 24)
            (ashiftrt:DI (reg:DI 24)
                (const_int 32 [0x20])))
         (nil))

I would really like to solve this problem. One approach will be *not* setting the SUBREG_PROMOTED_VAR_P unconditionally in expand_call at the time of subreg creation. I tried a hack to clear it always and it does fix my failing test, generate the missing &= 0xff

   |- expand_call
           if  REG_P(target)  && GET_MODE (target) != TYPE_MODE (rettype))
   ...
              pmode = promote_function_mode (type, ret_mode,
   &unsignedp, funtype, 1);
              target = gen_lowpart_SUBREG (ret_mode, target);
   -          SUBREG_PROMOTED_VAR_P (target) = 1;
   +          SUBREG_PROMOTED_VAR_P (target) = 0;
              SUBREG_PROMOTED_SET (target, unsignedp);


Of course, doing this unconditionally can regresses targets: for RISC-V testsuite run w/ this change I see one just one additional failure in the entire testsuite run.

        |          gcc |          g++ |     gfortran |
     
rv64imafdcv_zvl256b_zvfh_zawrs_zba_zbb_zbs_zicond_zfh_zicbom_zicbop_zicboz_zihintntl_zihintpause_zicsr_zifencei_zicntr_zihpm_zkr_zkt_ztso_zcb/
   lp64d/ medlow |  179 /    87 |    7 /     4 |   82 /    82 |
     
rv64imafdcv_zvl256b_zvfh_zawrs_zba_zbb_zbs_zicond_zfh_zicbom_zicbop_zicboz_zihintntl_zihintpause_zicsr_zifencei_zicntr_zihpm_zkr_zkt_ztso_zcb/
   lp64d/ medlow |  191 /    88 |    7 /     4 |   82 /    82 |

An additional sext.w in gcc/testsuite/gcc.target/riscv/zero-extend-3.c and that too only at -O0.

So it seems this could work but need to do this conditionally, based on some target capability or some such, I'm not sure what.
Or perhaps there is some yet another way to getting this done.

I would appreciate any feedback/comments

TIA,
-Vineet

[1] https://gcc.gnu.org/onlinedocs/gccint/Storage-Layout.html#index-PROMOTE_005fMODE [2] https://gcc.gnu.org/onlinedocs/gccint/Flags.html#index-SUBREG_005fPROMOTED_005fVAR_005fP
[3] https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633340.html
[4] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124171

Reply via email to