Hi,

We’ve seen that in a couple of places things like matrix operations are a CPU 
bottleneck. Being able to provide SSE/NEON optimised versions of some of those 
operations could help significantly. 

On x86/x64, we require SSE2 already anyway, so we should be able to use those 
unconditionally. On ARM, we can make this a compile time option with a C 
implementation as the fallback.

One problem is, that we can only get full benefit out of those if we can offer 
them inline. That would basically imply making our qsimd_p.h header public and 
including that one from qvectornd.h and qmatrixnxn.h (so that we can implement 
the operations using the SSE/NEON intrinsics). If we do that, we could e.g. 
implement QVector4D holding a __m128 value (and the neon equivalent on ARM).

I personally don’t think including qsimd.h (and implicitly immintrin.h) from 
our public headers would be a problem, but I’d be happy to hear arguments 
for/against it.

As a side note: SSE 4.1 offers some nice additional instructions that would 
simplify some of the operations. Should we keep the minimum requirement for SSE 
at version 2, or can we raise it to 4.1?

Cheers,
Lars

_______________________________________________
Development mailing list
[email protected]
https://lists.qt-project.org/listinfo/development

Reply via email to