On 8/18/25 02:58, Sebastian Huber wrote:
Hello,
I have question to the counters used for the condition coverage implementation
in tree-profile.cc
/* Stores the incoming edge and previous counters (in SSA form) on that edge
for the node e->deston that edge for the node e->dest. The counters record
the seen-true (0), seen-false (1), and current-mask (2). They are stored in
an array rather than proper members for access-by-index as the code paths
tend to be identical for the different counters. */
struct counters
{
edge e;
tree counter[3];
tree& operator [] (size_t i) { return counter[i]; }
};
While working on the -fprofile-update=atomic support for 32-bit targets which
lack support for 64-bit atomic operations, I noticed that some atomic
no-operations are generated for the instrumented code
(https://gcc.gnu.org/pipermail/gcc-patches/2025-August/692555.html). For
example:
int a(int i);
int b(int i);
int f(int i)
{
if (i) {
return a(i);
} else {
return b(i);
}
}
gcc -O2 -fprofile-update=atomic -fcondition-coverage -S -o - test.c
-fdump-tree-all
;; Function f (f, funcdef_no=0, decl_uid=4621, cgraph_uid=1, symbol_order=0)
int f (int i)
{
int _1;
int _6;
int _8;
<bb 2> [local count: 1073741824]:
if (i_3(D) != 0)
goto <bb 3>; [50.00%]
else
goto <bb 4>; [50.00%]
<bb 3> [local count: 536870912]:
__atomic_fetch_or_8 (&__gcov8.f[0], 1, 0);
__atomic_fetch_or_8 (&__gcov8.f[1], 0, 0);
_8 = a (i_3(D)); [tail call]
goto <bb 5>; [100.00%]
<bb 4> [local count: 536870912]:
__atomic_fetch_or_8 (&__gcov8.f[0], 0, 0);
__atomic_fetch_or_8 (&__gcov8.f[1], 1, 0);
_6 = b (0); [tail call]
<bb 5> [local count: 1073741824]:
# _1 = PHI <_8(3), _6(4)>
return _1;
}
The __atomic_fetch_or_8 (&__gcov8.f[1], 0, 0) and __atomic_fetch_or_8
(&__gcov8.f[0], 0, 0) could be optimized away. Since GCC is able to figure out that
the masks are compile-time constants wouldn't it be possible to use a simple uint64_t
for the current-mask (2) in struct counters?
Is this something you're seeing consistently, even when the number of
conditions go up?
I'm sure it's possible to optimize out this case by checking if the
current-mask still is the initial zero constant. The inputs that
determine the current-mask are constant and tied to the node, but the
final state of current-mask once the counters are flushed depends on the
path taken. I'll try to think a bit more about it, but I don't think it
can be replaced by a uint64_t without a redesign of the
instrument_decisions function.
Thanks,
Jørgen
I am not sure how the phi node stuff works in resolve_counter() since I am not
a compiler expert.
Kind regards, Sebastian