| Issue |
172872
|
| Summary |
[SPGO] Incorrect profile counts for copied instructions
|
| Labels |
new issue
|
| Assignees |
|
| Reporter |
Heath123
|
llvm-profgen determines the execution count of a line by finding the maximum execution count of any instructions with debug info attributed to the line. This creates issues when the compiler duplicates code across different control flow paths, which is done by many passes much as tail duplication and jump threading, as we instead want to sum over these copies of the block.
This issue is mentioned in [this RFC](https://lists.llvm.org/pipermail/llvm-dev/2016-October/106532.html), which proposes encoding a "copy identifier" in the discriminator. A sum could be taken over all the counts of instructions with each copy identifier. However, this has not yet been implemented.
I have found various cases where this happens:
### Case 1: Tail duplication
```c
#include <stdint.h>
#include <stdio.h>
void loop(uint64_t i) {
if (i % 10 != 0) return;
for (uint64_t i = 0; i < 10; i++) {
if (i % 2 == 0) {
printf("Hello\n");
} else {
printf("world\n");
}
}
}
```
Flags: `-O1 -fdebug-info-for-profiling` ([Compiler Explorer link](https://clang.godbolt.org/z/69GT694rK))
In this case, during machine block placement, the [tail duplicator](https://github.com/llvm/llvm-project/blob/23e6dbf864f4ff730dc2949dcc74d75633641624/llvm/lib/CodeGen/TailDuplicator.cpp) is called, which copies the `ret` instruction associated with the last line of the function into the block that handles the early exit when `i % 10 != 0`.
If running this function in a loop with different values for `i`, the early exit return instruction could have a count of e.g. 900, and the other return instruction could have a count of 100. In this case, the correct count for the last line of the function would be 1000, but it would currently be 900 due to profgen taking the max.
### Case 2: Jump threading
```c
#include <stdio.h>
#include <stdint.h>
#include <stdbool.h>
void loop(int X, bool cond) {
if (cond) {
puts("1");
X = 4;
}
puts("2");
if (X < 123) {
puts("3");
} else {
puts("4");
}
}
```
Flags: `-O3 -fdebug-info-for-profiling` ([Compiler Explorer link](https://clang.godbolt.org/z/PK5eY7657))
In this case, jump threading duplicates the `puts("2");` block into two paths, and the one which is run depends on the value of `cond`. If `cond` was true 50% of the time, the two copies could each have a count of 500, and the final count would be 500 where it should be 1000.
### Case 3: Loop unrolling with control flow
```c
#include <stdint.h>
#include <stdlib.h>
#include <stdio.h>
void action1(void);
void action2(void);
void loop(uint64_t i) {
for (uint64_t i = 0; i < 2; i++) {
if (i % 2 == 0) {
action1();
} else {
action2();
}
}
}
```
Flags: `-O3 -fdebug-info-for-profiling` ([Compiler Explorer link](https://clang.godbolt.org/z/68o6v6fn6))
In this case, loop unrolling duplicates the body of the loop, including the control flow. It applies a duplication factor of 2 to all of the blocks; however, this is not sufficient in this case, as we want to track the execution counts individually in each copy. The duplication factor approach only works if the same control flow path is taken on every execution of the loop, and this example breaks this assumption.
In the given example, as the value of `i` is known, the untaken path is removed later in the optimisation pipeline, but the duplication factor remains, meaning each call is incorrectly counted twice for each execution. In other cases, the control flow may remain after codegen, but the result will be similar.
Using copy IDs, profgen would be able to distinguish the blocks, and determine that one of the copies was executed but not the other.
Other passes which duplicate blocks also exist, for example the RFC linked above mentions loop peeling and GVN. [Key Instructions](https://llvm.org/docs/KeyInstructionsDebugInfo.html) has faced similar issues in assigning "atom group" numbers, but there are additional challenges with assigning copy identifiers due to limitations of the size of the field making using a global value less viable.
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs