Daniël Mantione wrote:
Op Sat, 7 Oct 2006, schreef Florian Klaempfl:
Vincent Snijders schrieb:
I started to add vector pascal like support, currently only i386/x86_64 are
supported (no generic support). The whole (currently implemented)
functionality is demonstrated by the following example. Please give some
feedback if it allows benchmark speedups.
To get a large speedup, I think you should instead of making pairs of
doubles, do the pixels in parallel. I.e. in this benchmark, a row is 3000
pixels wide, so, make an array of 3000 doubles, and do the operation with
arrays. With proper compiler optimization, it should be possible to
achieve speeds close to 2 flops a clock cycle.
The 'problem' in this benchmark is that the number of iterations of the
inner loop isn't fixed, but can vary between 1 and 50. If you pair two
doubles, the change you can break the loop for all elements of the
vector before iteration 50 is bigger than when you combine 3000 elements.
Vincent
_______________________________________________
fpc-devel maillist - fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel