Hi,

On Thu, May 05, 2011 at 03:47:08PM +0100, David Gilbert wrote:
> Hi Kiko,
> 
> On 5 May 2011 15:21, Christian Robottom Reis <k...@linaro.org> wrote:
> > Hey there,
> >
> >    I was asked today in the board meeting about the use of NEON
> > routines in the kernel; I said we had looked into this but hadn't done
> > it because a) it wasn't conclusively better and b) if better, it would
> > need to be done conditionally per-platform. But I wanted to double-check
> > that's actually true (and I'm copying Vijay to keep me honest). I have
> > some references:
> 
> Not quite:
>   a) Neon memcpy/memset is worse on A9 than non-neon versions (better
> on A8 typically)
>   b) In general I don't believe fpu or Neon code can be used
> internally to the kernel.
> 
> Dave
> >    
> > http://lists.linaro.org/pipermail/linaro-toolchain/2011-January/000722.html
> >
> >    
> > http://groups.google.com/group/beagleboard/browse_thread/thread/12c7bd415fbc0993/c54dde7b9d55cf99?pli=1
> >
> >    http://www.spinics.net/lists/arm-kernel/msg106503.html
> >
> >    http://dev.gentoo.org/~armin76/arm/memcpy-neon_result.txt
> >
> >    
> > https://wiki.linaro.org/WorkingGroups/ToolChain/Benchmarks/InitialMemcpy?highlight=%28memcpy%29
> >    
> > https://wiki.linaro.org/WorkingGroups/ToolChain/StringRoutines?highlight=%28memcpy%29
> 
> There may be the potential still for non-neon optimised memcpy/memset
> for Cortex a9; however
> the kernel routines are pretty good.

One important thing to observe is that NEON is, first and foremost, a
computation engine.  It isn't specifically designed for speeding up bulk
memory copies, so this probably isn't the first thing we should focus on
if we want to make a case for using NEON in the kernel.

Conversely, targeting NEON use at computational tasks is likely to deliver
much more consistent gains.

Secondly, VFP/NEON context switch overheads will tend towards the worst case
if NEON is used for memcpy(), simply because memcpy is used very often.
Microbenchmarks of core memcpy performance don't inform us about such system-
level effects.  We'd need metrics for the cost and frequency of those context
switches to get a better idea of the impact.  Even so, the ideal tradeoff may
not be the same on all platforms.

So some fruitful work therefore might involve:

  * Create infrastructure to allow NEON/VFP to be used in kernel-space (other
        architectures provide an example of how this can be done).
  * Add instrumentation to gather metrics on the context switching behaviour
        and cost.
  * Port some no-brainer functionality (such as CRC32) to use NEON, instrument
        and benchmark as appropriate.

These will allow a properly quantified case to be presented to upstream: if
a clear benefit is demonstrated, I doubt that "taboos" will present too much
of an obstacle.

Needless to say, any benchmarking should be done on multiple platforms, at
least A8 and A9.

Once the above work is done, we have the option to add memcpy to the mix --
however, as discussed in this thread, this isn't a no-brainer everywhere and
has subtleties; so it's probably best kept orthogonal from the tasks above.

This above work is not currently in the planning for 11.11, so if we want any
of it to happen we will need to take account of this in the planning.

> 
> > Incidentally, this ties into the question sent earlier this week which
> > had to do with Nico's work item in:
> >
> >    https://blueprints.launchpad.net/linux-linaro/+spec/other-kernel-thumb2
> >
> > Which IIRC Nico says probably isn't worth it, right?
> 
> I thought dmart had done a lot of that?

The NEON task was never really in my queue: its presence in the Thumb-2
blueprint seems a bit strange actually.  I believe there was no significant
work done on this in the 10.05 cycle.

Cheers
---Dave

_______________________________________________
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Reply via email to