Don schrieb:
Well, sort of.
It's impossible to align stack-allocated structs with any alignment greater than the alignment of the stack itself (which is 4 bytes). Anything larger than that and you HAVE to use the heap or alloca().


So how do other compilers supporting that alignment syntax do it?

Nothing on x86 benefits from more than 16 byte alignment, AFAIK, and it's never mandatory to use more than 8 byte alignment. I don't know so much about the recent GPUs, though -- do they really require 16 byte alignment or more?


I'm not sure how exactly this works and why they require alignment. Couldn't find anything about that in the clEnqueueWriteBuffer description where data gets written into GPU memory.


The specification for the OpenCL C language itself only states:

A data item declared to be a data type in memory is always aligned to the size of the data type in bytes. For example, a float4 variable will be aligned to a 16-byte boundary, a char2 variable will be aligned to a 2-byte boundary.

A built-in data type that is not a power of two bytes in size must be aligned to the next larger power of two. This rule applies to built-in types only, not structs or unions.



They also strangely state:

The components of vector data types with 1 ... 4 components can be addressed as <vector_data_type>.xyzw.

float4 c, a, b;

c.xyzw = (float4)(1.0f, 2.0f, 3.0f, 4.0f);
c.z = 1.0f;         // is a float
c.xy = (float2)(3.0f, 4.0f); // is a float2



So I wonder why they used arrays in the headers and not structs to be consistent with this.

Reply via email to