On Wed, 6 Nov 2024, Jakub Jelinek wrote:

> Hi!
> 
> encode_tree_to_bitpos uses the more expensive sub_byte_op_p mode in which
> it has to allocate a buffer and do various extra work like shifting the bits
> etc. if bitlen or bitpos aren't multiples of BITS_PER_UNIT, or if bitlen
> doesn't have corresponding integer mode.
> The last case is explained later in the comments:
>   /* The native_encode_expr machinery uses TYPE_MODE to determine how many
>      bytes to write.  This means it can write more than
>      ROUND_UP (bitlen, BITS_PER_UNIT) / BITS_PER_UNIT bytes (for example
>      write 8 bytes for a bitlen of 40).  Skip the bytes that are not within
>      bitlen and zero out the bits that are not relevant as well (that may
>      contain a sign bit due to sign-extension).  */
> Now, we've later added empty_ctor_p support, either {} CONSTRUCTOR
> or {CLOBBER}, which doesn't use native_encode_expr at all, just memset,
> so that case doesn't need those fancy games unless bitlen or bitpos
> aren't multiples of BITS_PER_UNIT (unlikely, but let's pretend it is
> possible).
> 
> The following patch makes us use the fast path even for empty_ctor_p
> which occupy full bytes, we can just memset that in the provided buffer and
> don't need to XALLOCAVEC another buffer.
> 
> This patch in itself fixes the testcase from the PR (which was about using
> huge XALLLOCAVEC), but I want to do some other changes, to be posted in a
> next patch.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Richard.

> 2024-11-06  Jakub Jelinek  <ja...@redhat.com>
> 
>       PR tree-optimization/117439
>       * gimple-ssa-store-merging.cc (encode_tree_to_bitpos): For
>       empty_ctor_p use !sub_byte_op_p even if bitlen doesn't have an
>       integral mode.
> 
> --- gcc/gimple-ssa-store-merging.cc.jj        2024-10-25 10:00:29.467767871 
> +0200
> +++ gcc/gimple-ssa-store-merging.cc   2024-11-04 18:40:14.667260621 +0100
> @@ -1934,14 +1934,15 @@ encode_tree_to_bitpos (tree expr, unsign
>                      unsigned int total_bytes)
>  {
>    unsigned int first_byte = bitpos / BITS_PER_UNIT;
> -  bool sub_byte_op_p = ((bitlen % BITS_PER_UNIT)
> -                     || (bitpos % BITS_PER_UNIT)
> -                     || !int_mode_for_size (bitlen, 0).exists ());
>    bool empty_ctor_p
>      = (TREE_CODE (expr) == CONSTRUCTOR
>         && CONSTRUCTOR_NELTS (expr) == 0
>         && TYPE_SIZE_UNIT (TREE_TYPE (expr))
> -                    && tree_fits_uhwi_p (TYPE_SIZE_UNIT (TREE_TYPE (expr))));
> +       && tree_fits_uhwi_p (TYPE_SIZE_UNIT (TREE_TYPE (expr))));
> +  bool sub_byte_op_p = ((bitlen % BITS_PER_UNIT)
> +                     || (bitpos % BITS_PER_UNIT)
> +                     || (!int_mode_for_size (bitlen, 0).exists ()
> +                         && !empty_ctor_p));
>  
>    if (!sub_byte_op_p)
>      {
> 
>       Jakub
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to