bearophile wrote:
Andrei Alexandrescu:
You already have a loop at the end that takes care of the stray
elements. Why not move it to the beginning to take care of the stray
elements _and_ unaligned elements in one shot?
Unfortunately things aren't that simple, you need a pre-loop and a post-loop.
That asm block can process only aligned values in groups of 8 floats. So if you
have an unaligned array of N items, that starts and ends before and after a
block of processable items, you need to process both head and tail separately.
a floats: ** **** **** **** **** ***
Loop blocks: xxxxxxxxx xxxxxxxxx xxxxxxxxx xxxxxxxxx
16b aligned: ____ ____ ____ ____ ____ ____ ____ ____
if you don't understand still, I can create an example.
Bye,
bearophile
Oh I see. Yah, it looks like both before and after loops are needed.
Liquid fuel rocket, ion engine, and parachute.
Andrei