However, since I have to write most of the functions with Assembly, I have to write every function multiple times in a hard-to-read format. After spending some time with using vectors, I came up with some suggestions: -32 bit and 64 bit long vectors have to be supported at the level of loading from the memory. The former is very useful in computer graphics. -There's a lot of low-level operations in SSEn that are either also present in e.g. NEON, or can be emulated through a simple function, like unpacking (often used for integer promotion by me).
I'm using vector operations in my graphics engine for rendering,
since low-level raster operations in GPU are well hidden under
layers of API, although I'm planning on porting the blitter
algorithms for DCompute once it becomes more mature, as well as
creating the CPUblit library for general use (will contain
blitter and alpha-blending functions as well as basic drawing
ones).
