On Thu, Aug 20, 2015 at 6:58 AM, Pekka Paalanen <ppaala...@gmail.com> wrote:

> A thing that explains a great deal of these anomalies, but not all of it,
> has
> something to do with function addresses. There are hypotheses that it might
> have to do with the branch predictor and its cache. We made a test
> targeting
> exactly that idea: pick a fast path function that seems to be most
> susceptible
> to unexpected changes, pad it with x nops before the function start and N-x
> nops after the function end. We never execute those nops, but changing x
> changes the function start address while keeping everything else in the
> whole
> binary in the same place.
> The results were mind-boggling: depending on the function starting
> address, the
> src_8888_8888 L1 test of lowlevel-blt-bench went either 355 Mpx/s or 470
> Mpx/s.
> There does not seem to be any predictable pattern on which addresses are
> "fast"
> and which are "slow". Obviously this will screw up our benchmarks, because
> a
> change in an unrelated function may cause another function's address to
> shift,
> and therefore change its performance. See [1] for the plot.
> [1] The plot of alignment vs. performance
> https://git.collabora.com/cgit/user/pq/pixman-benchmarking.git/plain/octave/figures/fig-src-8888-8888-L1.pdf

Could this be whether some "bad" instruction ends up next to or split by a
cache line boundary? That would produce a random-looking plot, though it
really is a plot of the location of the bad instructions in the measured

If this really is a problem then the ideal fix is for the compiler to
insert NOP instructions in order to move the bad instructions away from the
locations that make them bad. Yike.
Pixman mailing list

Reply via email to