On 9/29/25 3:29 PM, Jakub Jelinek wrote:
On Sat, Sep 27, 2025 at 03:21:17PM +0100, Jason Merrill wrote:
The Wstringop-overflow testcase change seems like an instance of a
preexisting bug; C++98-only xfails seem to come and go in this testcase.
Initially I thought this was because in C++11 and above we throw on
allocation size multiplication overflow, but in C++98 we saturate--but
-fno-exceptions makes us saturate in C++11, and we still only get the
warning in C++98. The tree dumps look the same until cddce1, at which point
we get this difference:
(C++11) Assume loop 3 to be finite: it has an exit and -ffinite-loops is on
or loop was previously finite.
(C++98) Estimating # of iterations of loop 3
cannot prove finiteness of loop 3
And the dumps diverge significantly from that point. It seems rather
unfortunate that the optimizers aren't able to prove finiteness of loops
produced by build_vec_init; how can we improve that?
This is quite unfortunate testcase.
The C++98 to C++11 difference on it is that for C++11 and above we default
to -ffinite-loops and for C++98 we don't. That seems reasonable to me,
only C++11 and later have something which resembles current [intro.progress]
in some way, I don't see any spots mentioning forward progress in C++98.
On the C++ FE side, the big question is if emitting the {CLOBBER(bob)}
clobbers is useful in all cases for array new expressions.
There are 3 cases.
One is if the whole size is constant.
IMHO it is much better in that case to emit a single bob clobber covering
the whole array first before the loop (build_vec_init), using a MEM_REF
with say the right array type. I think this would be beneficial both
for the case where there is some non-trivial constructor or when there is
nothing else.
Another case is when the whole allocated size is variable but there is
non-trivial constructor for the elements.
We can't emit a VLA MEM_REF CLOBBER before the loop, and if not anything
else, the {CLOBBER(bob)} in the loop will at least help the non-trivial
ctor right after it in the loop.
The last case is non-constant allocation size and trivial (vacuous?)
construction. This is what happens in the testcase, and to me it seems
at least currently the middle-end can't make any use of it and just
drops it in cddce1 pass, without anything benefiting from the clobber
(this is before complete unrolling which I think would be the only
way to make some use of it, but we are talking about non-constant
allocation sizes, so unlikely that would happen anyway unless after inlining
we figure out the size is now constant).
And this particular case is what happens in the testcase.
For C++11 (or C++98 with -ffinite-loops) we figure out that after removing
the inner loop which did have just CLOBBERs on the inner array elements
we have a loop which has no side-effects in the body and is not a trivial
infinite loop, has a conditional exit, and we just throw it away completely.
In the testcase, we have for that particular line new char[r_0_1][2]
and evrp 2 passes earlier figures out that r_0_1 is in the [0, 1] range.
Before evrp we have the loop nest like:
<bb 17> :
*_15 ={v} {CLOBBER(bob)};
_102 = _16 + -1;
_103 = _15 + 1;
<bb 18> :
# _15 = PHI <_13(20), _103(17)>
# _16 = PHI <1(20), _102(17)>
if (_16 >= 0)
goto <bb 17>; [INV]
else
goto <bb 19>; [INV]
<bb 19> :
_99 = _14 + -1;
_100 = _13 + 2;
<bb 20> :
# _13 = PHI <_63(16), _100(19)>
# _14 = PHI <_41(16), _99(19)>
if (_14 >= 0)
goto <bb 18>; [INV]
else
goto <bb 21>; [INV]
Now, evrp figures out that _14 has [-1, 0] range (it is r_0_1 - 1) and
changes the
# _14 = PHI <_41(16), _99(19)>
if (_14 >= 0)
to
# _14 = PHI <_41(14), -1(17)>
if (_14 == 0)
lines in there to, which is reasonable. We still have a loop which iterates
clearly 0 or 1 times and does nothing otherwise but we don't optimixe it
away just yet.
cddce1 then probably just uses the finite_p flag on loops (so only C++11)
and doesn't actually try to compute number of iterations and remove "empty"
loops with constant small upper bound of iterations, and we end up with
# RANGE [irange] long int [-1, 0]
_9 = (long int) _1;
...
<bb 9> [local count: 1073741824]:
# RANGE [irange] long int [-1, 0]
# _4 = PHI <_9(8), -1(14)>
if (_4 == 0)
goto <bb 14>; [89.00%]
else
goto <bb 10>; [11.00%]
<bb 14> [local count: 955630224]:
goto <bb 9>; [100.00%]
kind of dummy loops which don't do anything for quite a lot of passes.
And finally jump threading comes and because one function contains e.g. 4
or more of these one after another in the test (
void test_strcpy_new_char_array (size_t n)
{
size_t r_0_1 = UR (0, 1);
T (S (0), new char[r_0_1][1]);
T (S (1), new char[r_0_1][1]); // { dg-warning "\\\[-Wstringop-overflow" }
T (S (1), new char[r_0_1][2]);
T (S (2), new char[r_0_1][2]); // { dg-warning "\\\[-Wstringop-overflow" }
}
still reproduces it), it threads it by making separate paths for the r_0_1
== 0 and r_0_1 == 1 and guess the false positive is then on dead code or
something.
Anyway, on the C++ FE side I'd really recommend (but can be done
incrementally of course) to emit one larger clobber for all the whole array
if it has constant sizes
I did that in an earlier patch; it conflicted with an LWG issue
resolution, but people seemed open to fixing that, so I could do it again.
and not emit clobbers at least for now at all if
there is trivial constructor for the elements and we'd emit the clobbers in
a loop.
Unfortunately we still need the clobber for constexpr placement new;
with that change, if I modify constexpr-new4.C to take the new array
bound as a parameter, it breaks.
And on the middle-end side, the question is if either cddce1 should compute
number of iterations (upper bound) for loops with empty bodies which aren't
finite_p and optimize them away too, or if some other pass later on when
number of iterations is already computed shouldn't repeat that.
Jakub