On Fri, 28 Aug 2020 at 17:33, Alexander Monakov <amona...@ispras.ru> wrote: > > On Fri, 28 Aug 2020, Prathamesh Kulkarni via Gcc wrote: > > > I wonder if that's (one of) the main factor(s) behind slowdown or it's > > not too relevant ? > > Probably not. Some advice to make your search more directed: > > Pass '-n' to 'perf report'. Relative sample ratios are hard to reason about > when they are computed against different bases, it's much easier to see that > a loop is slowing down if it went from 4000 to 4500 in absolute sample count > as opposed to 90% to 91% in relative sample ratio. > > Before diving down 'perf report', be sure to fully account for differences > in 'perf stat' output. Do the programs execute the same number of > instructions, > so the difference only in scheduling? Do the programs suffer from the same > amount of branch mispredictions? Please show output of 'perf stat' on the > mailing list too, so everyone is on the same page about that. > > I also suspect that the dramatic slowdown has to do with the extra branch. > Your CPU might have some specialized counters for branch prediction, see > 'perf list'. Hi Alexander, Thanks for the suggestions! I am in the process of doing the benchmarking experiments, and will post the results soon.
Thanks, Prathamesh > > Alexander