Hi, I've been looking at GCC's profile information recently. I noticed that the probabilities/counts effectively encode the expected niters of a loop, so e.g. with the following testcase:
$ cat t.c int *a; void f() { for (int i = 0; i < N; i++) a[i] = i; } $ ./xgcc -B . -c -S -o /dev/null t.c -DN=32 -O2 -fno-tree-vectorize -fdump-tree-optimized=- ;; Function f (f, funcdef_no=0, decl_uid=4417, cgraph_uid=1, symbol_order=1) Removing basic block 5 void f () { unsigned long ivtmp.6; int i; int * a.0_1; <bb 2> [local count: 32534376]: a.0_1 = a; <bb 3> [local count: 1041207448]: # ivtmp.6_11 = PHI <ivtmp.6_10(3), 0(2)> i_12 = (int) ivtmp.6_11; MEM[(int *)a.0_1 + ivtmp.6_11 * 4] = i_12; ivtmp.6_10 = ivtmp.6_11 + 1; if (ivtmp.6_10 != 32) goto <bb 3>; [96.88%] else goto <bb 4>; [3.12%] <bb 4> [local count: 32534376]: return; } the counts effectively encode that the loop executes 32 times, since the count of the header (bb 3) is 32 times the count of the preheader (bb 2): $ python -c 'print(1041207448/32534376)' 32.003301615497406 however, I noticed that for larger values of N, the implied niters caps out at 99. E.g. for N=128: $ ./xgcc -B . -c t.c -S -o /dev/null -DN=128 -O2 -fno-tree-vectorize -fdump-tree-optimized=- ;; Function f (f, funcdef_no=0, decl_uid=4417, cgraph_uid=1, symbol_order=1) Removing basic block 5 void f () { unsigned long ivtmp.6; int i; int * a.0_1; <bb 2> [local count: 10737416]: a.0_1 = a; <bb 3> [local count: 1063004408]: # ivtmp.6_11 = PHI <ivtmp.6_10(3), 0(2)> i_12 = (int) ivtmp.6_11; MEM[(int *)a.0_1 + ivtmp.6_11 * 4] = i_12; ivtmp.6_10 = ivtmp.6_11 + 1; if (ivtmp.6_10 != 128) goto <bb 3>; [98.99%] else goto <bb 4>; [1.01%] <bb 4> [local count: 10737416]: return; } here the ratio of the counts is: $ python -c 'print(1063004408/10737416)' 99.00002086163002 I wrote a script to try the same reproducer with various values of N, and that shows the following (in the format "N : implied niters"): $ ./profile_niters.py 90 : 89.9090913170256 91 : 90.7431340154603 92 : 91.59262212919725 93 : 93.3396171245259 94 : 94.2381049694668 95 : 95.15387181344315 96 : 96.08735052729118 97 : 97.03920319702912 98 : 98.00990484649222 99 : 99.00002086163002 100 : 99.00002086163002 101 : 99.00002086163002 102 : 99.00002086163002 103 : 99.00002086163002 104 : 99.00002086163002 105 : 99.00002086163002 106 : 99.00002086163002 107 : 99.00002086163002 108 : 99.00002086163002 I was just wondering if anyone knows what the rationale for this behaviour is, and can point to the relevant bit in the source? I can imagine it might be something to do with floating-point precision and not wanting probabilities to get too small or large, but I'm not sure. Thanks, Alex