On Sunday, 23 March 2014 at 08:22:32 UTC, Vladimir Panteleev wrote:
I'm not familiar enough with vector instruction sets of current CPUs to answer this confidently. E.g. if there exists an integer vector multiply-and-add operation, then that could be used for fast software alpha blending. That operation's restrictions would dictate the optimal memory layout of the image. E.g. if the operation requires that the bytes to multiply and add are contiguous in memory, then it follows that the image should be represented with each channel as a separate sub-image.

There is the PMADDWD instruction that can be used for 8-bit blending. I don't think it requires a particular layout from the implementation, blending would probably be dominated by memory accesses.

Reply via email to