You are right that I do it at compile time only. Delegating the choice at 
runtime could really complicate things especially with vectorized data layout, 
and it is unnecessary for me to maintain binary compatibility since I have 
chips of power, xeon, nvidia gpus to worry about. To minimize performance 
impact of the simd type switch, it has to be at a very high level, at which 
point it almost equivalent to compile architecture specific binaries anyway.

Reply via email to