Re: Align a variable on the stack.

TheFlyingFiddle via Digitalmars-d-learn Wed, 04 Nov 2015 19:56:24 -0800

On Wednesday, 4 November 2015 at 01:14:31 UTC, Nicholas Wilsonwrote:

Note that there are two different alignments:
to control padding between instances on the stack(arrays)
         to control padding between members of a struct
align(64) //arrays
struct foo
{
      align(16) short baz; //between members
      align (1) float quux;
}
your 2.5x speedup is due to aligned vs. unaligned loads andstores which for SIMD type stuff has a really big effect.Basically misaligned stuff is really slow. IIRC there was a(blog/paper?) of someone on a uC spending a vast amount of timein ONE misaligned integer assignment causing traps and gettingthe kernel involved. Not quite as bad on x86 but still withdoing.
As to a less jacky solution I'm not sure there is one.

Thanks for the reply. I did some more checking around and I foundthat it was not really an alignment problem but was caused byusing the default init value of my type.


My starting type.
align(64) struct Phys
{
   float x, y, z, w;
   //More stuff.
} //Was 64 bytes in size at the time.

The above worked fine, it was fast and all. But after a while Iwanted the data in a diffrent format. So I started decodingpositions, and other variables in separate arrays.


Something like this:
align(16) struct Pos { float x, y, z, w; }

This counter to my limited knowledge of how cpu's work was muchslower. Doing the same thing lot's of times, touching less memorywith less branches should in theory at-least be faster right? Soafter I ruled out bottlenecks in the parser I assumed there wassome alignment problems so I did my Aligner hack. This caused tocode to run faster so I assumed this was the cause... Naive!(there was a typo in the code I submitted to begin with I used a= Align!(T).init and not a.value = T.init)

The performance was actually cased by the line : t = T.init nomatter if it was aligned or not. I solved the problem by changingthe struct to look like this.

align(16) struct Pos
{
    float x = float.nan;
    float y = float.nan;
    float z = float.nan;
    float w = float.nan;
}

Basically T.init get's explicit values. But... this should be thesame Pos.init as the default Pos.init. So I really fail tounderstand how this could fix the problem. I guessed the compilergenerates some slightly different code if I do it this way? Andthat this slightly different code fixes some bottleneck in thecpu. But when I took a look at the assembly of the function Icould not find any difference in the generated code...

I don't really know where to go from here to figure out theunderlying cause. Does anyone have any suggestions?

Re: Align a variable on the stack.

Reply via email to