On 6/11/25 11:02, Andrew MacLeod wrote:
On 6/10/25 17:05, Richard Biener wrote:
Am 10.06.2025 um 22:18 schrieb Andrew MacLeod <amacl...@redhat.com>:
I had a question asked of me, and now I'm passing the buck.
extern void *memcpy(void *, const void *, unsigned int);
extern int memcmp(const void *, const void *, unsigned int);
typedef unsigned long bits32;
typedef unsigned char byte;
static const byte orig[10] = {
'J', '2', 'O', 'Z', 'F', '5', '0', 'F', 'Y', 'L' };
static byte test[10];
int
verify (void)
{
return 0 == memcmp (test, orig, 10 * sizeof (orig[0]));
}
int
benchmark (void)
{
memcpy (test, orig, 10 * sizeof (orig[0]));
return 0;
}
Target is arm-none-eabi, and when compiled with -Os
After the gimple lowering, the verify routine remains the same, but
the benchmark () routine is transformed from a memcpy and becomes:
;; Function benchmark (benchmark, funcdef_no=1, decl_uid=4718,
cgraph_uid=4, symbol_order=3)
int benchmark ()
{
int D.4726;
MEM <unsigned char[10]> [(char * {ref-all})&test] = MEM
<unsigned char[10]> [(char * {ref-all})&orig];
D.4726 = 0;
goto <D.4727>;
<D.4727>:
return D.4726;
}
It appears that forwprop is then transforming the statement to
<bb 2> :
MEM <unsigned char[10]> [(char * {ref-all})&test] = "J2OZF50FYL";
return 0;
And in the final output, there are now 2 copies of the original
character data:
orig:
.ascii "J2OZF50FYL"
.space 2
.LC0:
.ascii "J2OZF50FYL"
.bss
and I presume that new string is a copy of the orig text that
forwprop has created for some reason.
Whats going on, and is there a way to disable this? Either at the
lowering stage or in forwprop? At -Os, they are not thrilled that
a bunch more redundant text is being generated in the object file.
This is a reduced testcase to demonstrate a much larger problem.
The hope is the static var can be elided and the read might be just a
small part. In this case heuristics are misfiring I guess. You’d
have to track down where exactly in folding we are replacing the RHS
of an aggregate copy. I can’t recall off my head.
Richard
heres my traceback where the "magic" happens
#0 fold_ctor_reference (type=0x7fffe9f3be70, ctor=0x7fffe9f2cc00,
poly_offset=..., poly_size=..., from_decl=0x7fffe9c6f980, suboff=0x0)
at /gcc/master/gcc/gcc/gimple-fold.cc:9955
#1 0x0000000001200074 in fold_const_aggregate_ref_1
(t=0x7fffe9f46de8, valueize=0x0) at
/gcc/master/gcc/gcc/gimple-fold.cc:10134
#2 0x0000000001200918 in fold_const_aggregate_ref (t=0x7fffe9f46de8)
at /gcc/master/gcc/gcc/gimple-fold.cc:10213
#3 0x00000000011db1aa in maybe_fold_reference (expr=0x7fffe9f46de8)
at /gcc/master/gcc/gcc/gimple-fold.cc:325
#4 0x00000000011db8bf in fold_gimple_assign (si=0x7fffffffd410) at
/gcc/master/gcc/gcc/gimple-fold.cc:473
#5 0x00000000011f20d5 in fold_stmt_1 (gsi=0x7fffffffd410,
inplace=false, valueize=0x18d3b10 <fwprop_ssa_val(tree)>,
dce_worklist=0x7fffffffd4c0) at /gcc/master/gcc/gcc/gimple-fold.cc:6648
ctor is a STRING_CST tree and has the string in it : "J2OZF50FYL"
The fold routine gets to :
/* We found the field with exact match. */
if (type
&& useless_type_conversion_p (type, TREE_TYPE (ctor))
&& known_eq (poly_offset, 0U))
return canonicalize_constructor_val (unshare_expr (ctor), from_decl);
I would hazard a guess that it is the "unshare_expr (ctor)" that is
causing the duplication of the string? I presume we have a good
reason for doing this? Perhaps that is a bad thing at -Os? I don't
relally remember all the unsharing details :-)
From this point, the presumed duplication of the string is returned
and there are no other gates before fold_stmt_1 calls
gimple_assign_set_rhs_from_tree (gsi, new_rhs);
with the newly copied and returned string.
I guess an alternate line of questioning is why on x86 we do not turn
the second functions call:
memcpy (test, orig, 10 * sizeof (orig[0]));
into
MEM <unsigned char[10]> [(char * {ref-all})&test] = MEM <unsigned
char[10]> [(char * {ref-all})&orig];
like arm-none-eabi does. It seems that this lowering is triggering
the fold and string duplication.
Andrew
The difference in lowering between my x86 build and the arm build is in
gimple_fold_call:
if (gimple_call_builtin_p (stmt, BUILT_IN_NORMAL))
{
if (gimple_fold_builtin (gsi))
changed = true;
}
on x86, it fails the gimple_call_builtin_p() call, whereas on arm it
does not, and proceeds to fold the builtin
the decl for memcpy on my x86 box has a built_in_class of NOT_BUILT_IN,
whereas on the arm build, it is set to BUILT_IN_NORMAL, which then
proceeds to do the fold.
Where is that determined? I don't see much in the config directories or
other obvious places as to where it is decided that it is a builtin
function or not?
Andrew