On Mon, Jul 31, 2023 at 03:53:26PM +0200, Richard Biener wrote:
[snip]
> > The main difference in the compilation output about code around the
> > miss-prediction
> > branch is:
> > o In O2: predicated instruction (cmov here) is selected to eliminate above
> > branch. cmov is true better than branch here.
> > o In O3/PGO: bitout() is inlined into encode_file(), and branch
> > instruction
> > is selected. But this branch is obviously *unpredictable* and the
> > compiler
> > doesn't know it. This why O3/PGO are are so bad for this program.
> >
> > Gcc doesn't support __builtin_unpredictable() which has been introduced by
> > llvm.
> > Then I tried to see if __builtin_expect_with_probability(e,x, 0.5) can
> > serve the
> > same purpose. The result is negative.
>
> But does it appear to be predictable with your profiling data?
>
I profiled the branch-misses event on a kabylake machine. 99% of the
mis-prediction blames to encode_file() function.
$ sudo perf record -e branch-instructions:pp,branch-misses:pp -c 1000 --
taskset -c 0 ./huffman.O3 test.data
Samples: 197K of event 'branch-misses:pp', Event count (approx.): 197618000
Overhead Command Shared Object Symbol
99.58% huffman.O3 huffman.O3 [.] encode_file
0.12% huffman.O3 [kernel.vmlinux] [k] __x86_indirect_thunk_array
0.11% huffman.O3 libc-2.31.so [.] _IO_getc
0.01% huffman.O3 [kernel.vmlinux] [k] common_file_perm
Then annotate encode_file() function:
Samples: 197K of event 'branch-misses:pp', 1000 Hz, Event count (approx.):
197618000
encode_file /work/myWork/linux/pgo/huffman.O3 [Percent: local period]
Percent│ ↑ je 38
│ bitout():
│ current_byte <<= 1;
│ 70: add %edi,%edi
│ if (b == '1') current_byte |= 1;
48.70 │ ┌──cmp $0x31,%dl
47.11 │ ├──jne 7a
│ │ or $0x1,%edi
│ │nbits++;
│ 7a:└─→inc %eax
│ if (b == '1') current_byte |= 1;
│ mov %edi,current_byte
│ nbits++;
│ mov %eax,nbits
│ if (nbits == 8) {
1.16 │ cmp $0x8,%eax
3.03 │ ↓ je a0
│ encode_file():
│ for (s=codes[ch]; *s; s++) bitout (outfile, *s);
│ movzbl 0x1(%r13),%edx
│ inc %r13
│ test %dl,%dl
│ ↑ jne 70
│ ↑ jmp 38
│ nop
--
Cheers,
Changbin Du