On Thu, 7 Dec 2023, Jakub Jelinek wrote:

> Hi!
> 
> As documented, --param min-nondebug-insn-uid= is very useful in debugging
> -fcompare-debug issues in RTL dumps, without it it is really hard to
> find differences.  With it, DEBUG_INSNs generally use low INSN_UIDs
> (1+) and non-DEBUG_INSNs use INSN_UIDs from the parameter up.
> For good results, the parameter should be larger than the number of
> DEBUG_INSNs in all or at least problematic functions, so I typically
> use --param min-nondebug-insn-uid=10000 or --param
> min-nondebug-insn-uid=1000.
> 
> The PR is about using --param min-nondebug-insn-uid=2147483647 or
> similar behavior can be achieved with that minus some epsilon,
> INSN_UIDs for the non-debug insns then wrap around and as they are signed,
> all kinds of things break.  Obviously, that can happen even without that
> option, but functions containing more than 2147483647 insns usually don't
> compile much earlier due to getting out of memory.
> As it is a debugging option, I'd prefer not to impose any drastically small
> limits on it because if a function has a lot of DEBUG_INSNs, it is useful
> to start still above them, otherwise the allocation of uids will DTRT
> even for DEBUG_INSNs but there will be then differences in non-DEBUG_INSN
> allocations.
> 
> So, the following patch uses 0x40000000 limit, half the maximum amount for
> DEBUG_INSNs and half for non-DEBUG_INSNs, it will still result in very
> unlikely overflows in real world.
> 
> Note, using large min-nondebug-insn-uid is very expensive for compile time
> memory and compile time, because DF as well as various RTL passes use
> arrays indexed by INSN_UIDs, e.g. LRA with sizeof (void *) elements,
> ditto df (df->insns).
> 
> Now, in LRA I've ran into ICEs already with
> --param min-nondebug-insn-uid=0x2aaaaaaa
> on 64-bit host.  It uses a custom vector management and wants to grow
> allocation 1.5x when growing, but all this computation is done in int,
> so already 0x2aaaaaab * 3 / 2 + 1 overflows to negative value.  And
> unlike vec.cc growing which also uses unsigned int type for the above
> (and the + 1 is not there), it also doesn't make sure if there is an
> overflow that it allocates at least as much as needed, vec.cc
> does
>   if ...
>   else
>     /* Grow slower when large.  */
>     alloc = (alloc * 3 / 2);
> 
>   /* If this is still too small, set it to the right size. */
>   if (alloc < desired)
>     alloc = desired;
> so even if there is overflow during the * 1.5 computation, but
> desired is still representable in the range of the alloced counter
> (31-bits in both vec.h and LRA), it doesn't grow exponentially but
> at least works for the current value.
> 
> So, one way to fix the LRA issue would be just to use
>   lra_insn_recog_data_len = index * 3U / 2;
>   if (lra_insn_recog_data_len <= index)
>     lra_insn_recog_data_len = index + 1;
> basically do what vec.cc does.  I thought we can do better for
> both vec.cc and LRA on 64-bit hosts even without growing the allocated
> counters, but now that I look at it again, perhaps we can't.
> The above overflows already with original alloc or lra_insn_recog_data_len
> 0x55555556, where 0x5555555 * 3U / 2 is still 0x7fffffff
> and so representable in the 32-bit, but 0x55555556 * 3U / 2 is
> 1.  I thought (and the patch implements it) that we could use
> alloc * (size_t) 3 / 2 so that on 64-bit hosts it wouldn't overflow
> that quickly, but 0x55555556 * (size_t) 3 / 2 there is 0x80000001
> which is still ok in unsigned, but given that vec.h then stores the
> counter into unsigned m_alloc:31; bit-field, it is too much.
> 
> The patch below is what I've actually bootstrapped/regtested on
> x86_64-linux and i686-linux, but given the above I think I should drop
> the vec.cc hunk and change (size_t) 3 in the LRA hunk to 3U.
> 
> With the lra.cc change, one can actually compile simple function
> with -O0 on 64-bit host with --param min-nondebug-insn-uid=0x40000000
> (i.e. the new limit), but already needed quite a big part of my 32GB
> RAM + 24GB swap.
> The patch adds a dg-skip-if for that case though, because such option
> is way too much for 32-bit hosts even at -O0 and empty function,
> and with -O3 on a longer function it is too much for average 64-bit host
> as well.  Without the dg-skip-if I got on 64-bit host:
> cc1: out of memory allocating 571230784744 bytes after a total of 2772992 
> bytes
> and
> cc1: out of memory allocating 1388 bytes after a total of 2002944 bytes
> on 32-bit host.  A test requiring more than 532GB of RAM on 64-bit hosts
> is just too much for our testsuite.
> 
> 2023-12-07  Jakub Jelinek  <ja...@redhat.com>
> 
>       PR middle-end/112411
>       * params.opt (-param=min-nondebug-insn-uid=): Add
>       IntegerRange(0, 1073741824).
>       * lra.cc (check_and_expand_insn_recog_data): Use (size_t) 3
>       rather than 3 in * 3 / 2 computation and if the result is
>       smaller or equal to index, use index + 1.
>       * vec.cc (vec_prefix::calculate_allocation_1): Use (size_t) 3
>       rather than 3 in * 3 / 2 computation.
> 
>       * gcc.dg/params/blocksort-part.c: Add dg-skip-if for
>       --param min-nondebug-insn-uid=1073741824.
> 
> --- gcc/params.opt.jj 2023-11-02 07:49:18.010852541 +0100
> +++ gcc/params.opt    2023-12-06 18:55:57.045420935 +0100
> @@ -779,7 +779,7 @@ Common Joined UInteger Var(param_min_loo
>  The minimum threshold for probability of semi-invariant condition statement 
> to trigger loop split.
>  
>  -param=min-nondebug-insn-uid=
> -Common Joined UInteger Var(param_min_nondebug_insn_uid) Param
> +Common Joined UInteger Var(param_min_nondebug_insn_uid) Param 
> IntegerRange(0, 1073741824)
>  The minimum UID to be used for a nondebug insn.

This new limit is OK.

>  -param=min-size-for-stack-sharing=
> --- gcc/lra.cc.jj     2023-12-05 13:17:29.642260866 +0100
> +++ gcc/lra.cc        2023-12-06 19:52:01.759241999 +0100
> @@ -768,7 +768,9 @@ check_and_expand_insn_recog_data (int in
>    if (lra_insn_recog_data_len > index)
>      return;
>    old = lra_insn_recog_data_len;
> -  lra_insn_recog_data_len = index * 3 / 2 + 1;
> +  lra_insn_recog_data_len = index * (size_t) 3 / 2;
> +  if (lra_insn_recog_data_len <= index)
> +    lra_insn_recog_data_len = index + 1;

As is this (though I believe 3U would be better, no need for 64bit
multiplies with the later check).

>    lra_insn_recog_data = XRESIZEVEC (lra_insn_recog_data_t,
>                                   lra_insn_recog_data,
>                                   lra_insn_recog_data_len);
> --- gcc/vec.cc.jj     2023-09-27 10:37:39.329838572 +0200
> +++ gcc/vec.cc        2023-12-06 19:53:34.670940078 +0100
> @@ -160,7 +160,7 @@ vec_prefix::calculate_allocation_1 (unsi
>      alloc = alloc * 2;
>    else
>      /* Grow slower when large.  */
> -    alloc = (alloc * 3 / 2);
> +    alloc = (alloc * (size_t) 3 / 2);

I agree this isn't much of an improvement, alloc < desired
catches overflow but we don't catch when desired doesn't fit
the 31 bit limit ...

Richard.

>    /* If this is still too small, set it to the right size. */
>    if (alloc < desired)
> --- gcc/testsuite/gcc.dg/params/blocksort-part.c.jj   2020-01-12 
> 11:54:37.463397567 +0100
> +++ gcc/testsuite/gcc.dg/params/blocksort-part.c      2023-12-07 
> 08:46:11.656974144 +0100
> @@ -1,4 +1,5 @@
>  /* { dg-skip-if "AArch64 does not support these bounds." { aarch64*-*-* } { 
> "--param stack-clash-protection-*" } } */
> +/* { dg-skip-if "For 32-bit hosts such param is too much and even for 64-bit 
> might require hundreds of GB of RAM" { *-*-* } { "--param 
> min-nondebug-insn-uid=1073741824" } } */
>  
>  /*-------------------------------------------------------------*/
>  /*--- Block sorting machinery                               ---*/
> 
>       Jakub
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to