Michael Hope <michael.h...@linaro.org> writes:
> I put a build harness around libav and gathered some profiling data.  See:
>  bzr branch lp:~linaro-toolchain-dev/+junk/libav-suite
>
> It includes a Makefile that builds a C only, h.264 only decoder and
> two Creative Commons licensed videos to use as input.

Thanks for putting this together.

> README.rst has the basic commands for running ffmpeg and initial perf
> results showing the hot functions.  Dave, 20 % of the time is spent in
> memcpy() so you might want to have a look.
>
> The vectoriser has no effect.  GCC 4.5 is ~17 % faster than 4.6.  I'll
> look into extracting and harnessing the functions themselves later
> this week.

I had a look why auto-vectorisation wasn't having much effect.
It looks from your profile that most of the hot functions are
operating on 16x16 blocks of pixels with an unknown line stride.
So the C code looks like:

    for (i = 0; i < 16; i++)
      {
        x[0] = OP (x[0]);
        ...
        x[15] = OP (x[15]);
        x += stride;
      }

Because of the unknown stride, we're relying on SLP rather than
loop-based vectorisation to handle this kind of loop.  The problem
is that SLP is being run _as_ a loop optimisation.  At the moment,
the gimple data-ref analysis code assumes that, during a loop
optimisation, only simple induction variables are of interest,
so it treats all of the x[...] references above as unrepresentable.
If I move SLP outside the loop optimisations (just as a proof of concept),
then that problem goes away.

I talked about this with Ira, who said that SLP had been placed
where it is because ivopts (a later loop optimisation) obfuscates
things too much.  As Ira said, we should probably look at (conditionally)
removing the assumption that only IVs are of interest during loop
optimisations.

Another problem is that SLP supports a much smaller range of
optimisations than the loop-based vectoriser.  There's no support
for promotion, demotion, or conditional expressions.  This affects
things like the weight_h264_pixels* functions, which contain
conditional moves.

So, maybe some nice areas for future work.

Richard

_______________________________________________
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Reply via email to