That's true, and therefore also not in Julia, unless using some command to inline assembly. However, in C it might be possible to get to a factor 2 of BLAS speed. This might be sufficient if you want to implement something slightly different from matrix multiplication (like maybe this case) and where you might create extra overhead when trying to reformulate it using BLAS matrix multiplication.
- [julia-users] Re: Efficient way to compute X' diag(w) Y Matthieu
- [julia-users] Re: Efficient way to compute X' diag(w) ... Andreas Noack
- [julia-users] Re: Efficient way to compute X' diag(w) ... Josh Langsfeld
- Re: [julia-users] Re: Efficient way to compute X' ... Andreas Noack
- Re: [julia-users] Re: Efficient way to compute... Josh Langsfeld
- Re: [julia-users] Re: Efficient way to com... Andreas Noack
- Re: [julia-users] Re: Efficient way t... David Gold
- Re: [julia-users] Re: Efficient w... Jutho
- Re: [julia-users] Re: Efficient w... Andreas Noack
- Re: [julia-users] Re: Efficient w... Jutho
- [julia-users] Re: Efficient way to compute X' diag(w) Y David Gold
