--- Comment #2 from Witold Baryluk <> 2010-02-04 
15:54:00 PST ---
(In reply to comment #1)
> > Rationale for this is that modern processors have SSE instructions which 
> could perform up to 4 mathematial operations in parallel (like sin, cos, exp, 
> log, pow).
> Not really. The x87 has built-in functions for trig and exponential 
> functions, 
> but SSE doesn't.
> It's pretty hard to make them more efficient than calling a loop on each 
> element individually. If you only need approximate values, it's possible to 
> get 
> a modest speedup, but if you need full accuracy, it's tough.
> Essentially because you can't have any branch instructions in the 
> calculation, 
> and working around this quickly chews up the 4-at-a-time benefit.

Ok, you are right. But there are CPUs with transcendental functions,
like AlitVec, Cell. Also Larabbe was supposed to have one.

About approximated values, you have right. But such approximate functions of
ie. sin,
cann't be used becuase they will be not precise enough or they will be to slow,
or they will not fully conform to IEEE 754.

It would be better to have possibility to write custom functions like in my
But my example wasn't giving any performance benefits.

One can write approximated_sin(float x).
It would be interesting question how to provide a vectorized version,
for array operations.

Implicit approximated_sin(float[] x) is not vectorizable automatically.

Also performing 4-way evaluation in paralelal of approximated sin, with x
automatically to float[4] will not be good, because even that it is pure
nothrow, it still
can ie. perform conditional instruction and variable lenght loops.

Normally such problems are resolved using masking in SSE registers, but it
needs to be
solved manually by programmer.

mayby. approximated_sin(N)(float[N] x) ? 

when mostly N=4.

problems (and possible solutions) remains:
 - portability - approximated_sin!(4) will use platform specific ways of useing
SSE (via intrinisic preferable to not allocate registers by hand)l
 - alligment - compiler will call not vectorized (approximated_sin!(1) ? or
just approximated_sin) functions on the bounduary of the arrays, so rest of the
array operations and function calls will be alligned properly.
 - conditionals - allowed, but should not be used.

> You'd do this for syntax sugar, not for performance.
Yes. Array operations is nice syntax, and leavs a potential to speed up
For just a syntax suggar it is already good to extend this expressions.
(they currently anyway doesn't use sse).

Seeing this problems now, i see that SSE argument isn't so simple. But still it
is usefull to extend arrayops. SSE issues need to be addresse and maybe after
discusion solved in the feature.

It is also possible to perform compile time parsing of expression (using real
parser, or by help of compiler/templates/types), and emit mixin with proper
code. But this raises other questions:
 - why library code need to perform the same thing as compiler
 - why then we need array operations at all
 - mixins aren't so transparent to the user as could be "macros" (which we
don't have yet).

Configure issuemail:
------- You are receiving this mail because: -------

Reply via email to