> We can safely assume that axpy is vectorized
I think it would depend on how closely the BLAS implementations follow the letter of the law when it comes to floating-point semantics. Arch Robison has written and spoken (recently at JuliaCon, but also elsewhere) about how the nonassociativity of floating point operations means that computations that require reordering of operations to vectorize cannot be vectorized without compromising strict IEEE-754-compliant rounding behavior.
