On Thu, 7 Dec 2023, Jakub Jelinek wrote: > Hi! > > As documented, --param min-nondebug-insn-uid= is very useful in debugging > -fcompare-debug issues in RTL dumps, without it it is really hard to > find differences. With it, DEBUG_INSNs generally use low INSN_UIDs > (1+) and non-DEBUG_INSNs use INSN_UIDs from the parameter up. > For good results, the parameter should be larger than the number of > DEBUG_INSNs in all or at least problematic functions, so I typically > use --param min-nondebug-insn-uid=10000 or --param > min-nondebug-insn-uid=1000. > > The PR is about using --param min-nondebug-insn-uid=2147483647 or > similar behavior can be achieved with that minus some epsilon, > INSN_UIDs for the non-debug insns then wrap around and as they are signed, > all kinds of things break. Obviously, that can happen even without that > option, but functions containing more than 2147483647 insns usually don't > compile much earlier due to getting out of memory. > As it is a debugging option, I'd prefer not to impose any drastically small > limits on it because if a function has a lot of DEBUG_INSNs, it is useful > to start still above them, otherwise the allocation of uids will DTRT > even for DEBUG_INSNs but there will be then differences in non-DEBUG_INSN > allocations. > > So, the following patch uses 0x40000000 limit, half the maximum amount for > DEBUG_INSNs and half for non-DEBUG_INSNs, it will still result in very > unlikely overflows in real world. > > Note, using large min-nondebug-insn-uid is very expensive for compile time > memory and compile time, because DF as well as various RTL passes use > arrays indexed by INSN_UIDs, e.g. LRA with sizeof (void *) elements, > ditto df (df->insns). > > Now, in LRA I've ran into ICEs already with > --param min-nondebug-insn-uid=0x2aaaaaaa > on 64-bit host. It uses a custom vector management and wants to grow > allocation 1.5x when growing, but all this computation is done in int, > so already 0x2aaaaaab * 3 / 2 + 1 overflows to negative value. And > unlike vec.cc growing which also uses unsigned int type for the above > (and the + 1 is not there), it also doesn't make sure if there is an > overflow that it allocates at least as much as needed, vec.cc > does > if ... > else > /* Grow slower when large. */ > alloc = (alloc * 3 / 2); > > /* If this is still too small, set it to the right size. */ > if (alloc < desired) > alloc = desired; > so even if there is overflow during the * 1.5 computation, but > desired is still representable in the range of the alloced counter > (31-bits in both vec.h and LRA), it doesn't grow exponentially but > at least works for the current value. > > So, one way to fix the LRA issue would be just to use > lra_insn_recog_data_len = index * 3U / 2; > if (lra_insn_recog_data_len <= index) > lra_insn_recog_data_len = index + 1; > basically do what vec.cc does. I thought we can do better for > both vec.cc and LRA on 64-bit hosts even without growing the allocated > counters, but now that I look at it again, perhaps we can't. > The above overflows already with original alloc or lra_insn_recog_data_len > 0x55555556, where 0x5555555 * 3U / 2 is still 0x7fffffff > and so representable in the 32-bit, but 0x55555556 * 3U / 2 is > 1. I thought (and the patch implements it) that we could use > alloc * (size_t) 3 / 2 so that on 64-bit hosts it wouldn't overflow > that quickly, but 0x55555556 * (size_t) 3 / 2 there is 0x80000001 > which is still ok in unsigned, but given that vec.h then stores the > counter into unsigned m_alloc:31; bit-field, it is too much. > > The patch below is what I've actually bootstrapped/regtested on > x86_64-linux and i686-linux, but given the above I think I should drop > the vec.cc hunk and change (size_t) 3 in the LRA hunk to 3U. > > With the lra.cc change, one can actually compile simple function > with -O0 on 64-bit host with --param min-nondebug-insn-uid=0x40000000 > (i.e. the new limit), but already needed quite a big part of my 32GB > RAM + 24GB swap. > The patch adds a dg-skip-if for that case though, because such option > is way too much for 32-bit hosts even at -O0 and empty function, > and with -O3 on a longer function it is too much for average 64-bit host > as well. Without the dg-skip-if I got on 64-bit host: > cc1: out of memory allocating 571230784744 bytes after a total of 2772992 > bytes > and > cc1: out of memory allocating 1388 bytes after a total of 2002944 bytes > on 32-bit host. A test requiring more than 532GB of RAM on 64-bit hosts > is just too much for our testsuite. > > 2023-12-07 Jakub Jelinek <ja...@redhat.com> > > PR middle-end/112411 > * params.opt (-param=min-nondebug-insn-uid=): Add > IntegerRange(0, 1073741824). > * lra.cc (check_and_expand_insn_recog_data): Use (size_t) 3 > rather than 3 in * 3 / 2 computation and if the result is > smaller or equal to index, use index + 1. > * vec.cc (vec_prefix::calculate_allocation_1): Use (size_t) 3 > rather than 3 in * 3 / 2 computation. > > * gcc.dg/params/blocksort-part.c: Add dg-skip-if for > --param min-nondebug-insn-uid=1073741824. > > --- gcc/params.opt.jj 2023-11-02 07:49:18.010852541 +0100 > +++ gcc/params.opt 2023-12-06 18:55:57.045420935 +0100 > @@ -779,7 +779,7 @@ Common Joined UInteger Var(param_min_loo > The minimum threshold for probability of semi-invariant condition statement > to trigger loop split. > > -param=min-nondebug-insn-uid= > -Common Joined UInteger Var(param_min_nondebug_insn_uid) Param > +Common Joined UInteger Var(param_min_nondebug_insn_uid) Param > IntegerRange(0, 1073741824) > The minimum UID to be used for a nondebug insn.
This new limit is OK. > -param=min-size-for-stack-sharing= > --- gcc/lra.cc.jj 2023-12-05 13:17:29.642260866 +0100 > +++ gcc/lra.cc 2023-12-06 19:52:01.759241999 +0100 > @@ -768,7 +768,9 @@ check_and_expand_insn_recog_data (int in > if (lra_insn_recog_data_len > index) > return; > old = lra_insn_recog_data_len; > - lra_insn_recog_data_len = index * 3 / 2 + 1; > + lra_insn_recog_data_len = index * (size_t) 3 / 2; > + if (lra_insn_recog_data_len <= index) > + lra_insn_recog_data_len = index + 1; As is this (though I believe 3U would be better, no need for 64bit multiplies with the later check). > lra_insn_recog_data = XRESIZEVEC (lra_insn_recog_data_t, > lra_insn_recog_data, > lra_insn_recog_data_len); > --- gcc/vec.cc.jj 2023-09-27 10:37:39.329838572 +0200 > +++ gcc/vec.cc 2023-12-06 19:53:34.670940078 +0100 > @@ -160,7 +160,7 @@ vec_prefix::calculate_allocation_1 (unsi > alloc = alloc * 2; > else > /* Grow slower when large. */ > - alloc = (alloc * 3 / 2); > + alloc = (alloc * (size_t) 3 / 2); I agree this isn't much of an improvement, alloc < desired catches overflow but we don't catch when desired doesn't fit the 31 bit limit ... Richard. > /* If this is still too small, set it to the right size. */ > if (alloc < desired) > --- gcc/testsuite/gcc.dg/params/blocksort-part.c.jj 2020-01-12 > 11:54:37.463397567 +0100 > +++ gcc/testsuite/gcc.dg/params/blocksort-part.c 2023-12-07 > 08:46:11.656974144 +0100 > @@ -1,4 +1,5 @@ > /* { dg-skip-if "AArch64 does not support these bounds." { aarch64*-*-* } { > "--param stack-clash-protection-*" } } */ > +/* { dg-skip-if "For 32-bit hosts such param is too much and even for 64-bit > might require hundreds of GB of RAM" { *-*-* } { "--param > min-nondebug-insn-uid=1073741824" } } */ > > /*-------------------------------------------------------------*/ > /*--- Block sorting machinery ---*/ > > Jakub > > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)