Hi,

I've been looking at GCC's profile information recently.  I noticed that the
probabilities/counts effectively encode the expected niters of a loop, so e.g.
with the following testcase:

$ cat t.c
int *a;
void f() {
  for (int i = 0; i < N; i++)
    a[i] = i;
}
$ ./xgcc -B . -c -S -o /dev/null t.c -DN=32 -O2 -fno-tree-vectorize 
-fdump-tree-optimized=-

;; Function f (f, funcdef_no=0, decl_uid=4417, cgraph_uid=1, symbol_order=1)

Removing basic block 5
void f ()
{
  unsigned long ivtmp.6;
  int i;
  int * a.0_1;

  <bb 2> [local count: 32534376]:
  a.0_1 = a;

  <bb 3> [local count: 1041207448]:
  # ivtmp.6_11 = PHI <ivtmp.6_10(3), 0(2)>
  i_12 = (int) ivtmp.6_11;
  MEM[(int *)a.0_1 + ivtmp.6_11 * 4] = i_12;
  ivtmp.6_10 = ivtmp.6_11 + 1;
  if (ivtmp.6_10 != 32)
    goto <bb 3>; [96.88%]
  else
    goto <bb 4>; [3.12%]

  <bb 4> [local count: 32534376]:
  return;

}

the counts effectively encode that the loop executes 32 times, since the count
of the header (bb 3) is 32 times the count of the preheader (bb 2):

$ python -c 'print(1041207448/32534376)'
32.003301615497406

however, I noticed that for larger values of N, the implied niters caps out at
99.  E.g. for N=128:

$ ./xgcc -B . -c t.c -S -o /dev/null -DN=128 -O2 -fno-tree-vectorize 
-fdump-tree-optimized=-

;; Function f (f, funcdef_no=0, decl_uid=4417, cgraph_uid=1, symbol_order=1)

Removing basic block 5
void f ()
{
  unsigned long ivtmp.6;
  int i;
  int * a.0_1;

  <bb 2> [local count: 10737416]:
  a.0_1 = a;

  <bb 3> [local count: 1063004408]:
  # ivtmp.6_11 = PHI <ivtmp.6_10(3), 0(2)>
  i_12 = (int) ivtmp.6_11;
  MEM[(int *)a.0_1 + ivtmp.6_11 * 4] = i_12;
  ivtmp.6_10 = ivtmp.6_11 + 1;
  if (ivtmp.6_10 != 128)
    goto <bb 3>; [98.99%]
  else
    goto <bb 4>; [1.01%]

  <bb 4> [local count: 10737416]:
  return;

}

here the ratio of the counts is:

$ python -c 'print(1063004408/10737416)'
99.00002086163002

I wrote a script to try the same reproducer with various values of N,
and that shows the following (in the format "N : implied niters"):

$ ./profile_niters.py
90 : 89.9090913170256
91 : 90.7431340154603
92 : 91.59262212919725
93 : 93.3396171245259
94 : 94.2381049694668
95 : 95.15387181344315
96 : 96.08735052729118
97 : 97.03920319702912
98 : 98.00990484649222
99 : 99.00002086163002
100 : 99.00002086163002
101 : 99.00002086163002
102 : 99.00002086163002
103 : 99.00002086163002
104 : 99.00002086163002
105 : 99.00002086163002
106 : 99.00002086163002
107 : 99.00002086163002
108 : 99.00002086163002

I was just wondering if anyone knows what the rationale for this behaviour is,
and can point to the relevant bit in the source?

I can imagine it might be something to do with floating-point precision
and not wanting probabilities to get too small or large, but I'm not
sure.

Thanks,
Alex

Reply via email to