> > I'd like to approach this speed at the end. > > > I don't think it is possible in Julia right now without using dirty tricks > such as passing pointers around. You'd like to get the speed from BLAS by > operating on panels of your matrix, but you'd like to avoid the copying and > reallocation of arrays. If you devectorised your code completely, there'd > be no allocation, but you wouldn't have the fast multiplications in BLAS. > You can get some of the way by using one of the ArrayViews packages, but > the garbage collector will charge a fee for each view you produce so LAPACK > speed is not attainable. > > Med venlig hilsen > > Andreas Noack >
Indeed, achieving BLAS speed is rather difficult for good reasons you mentioned above. Nevertheless, I'd like to match (or beat :)) the speed of my MGS routine written in Octave. It takes about 4 seconds in Octave and about 6.6 seconds in Julia. In Octave I could write it in a very straight forward fashion (Octave code available in the very first post of this thread). Apparently most of the computing time of my Julia MGS routine is consumed by line #14. I tried to comment out the line 14. Computing time (and memory allocation too) then dropped sharply from 6.6s to 0.37s. This indicates that line 13 is fast already in its vectorized form. This is very nice since it is a very compact piece of code. I'll try to devectorize line 14 (or try to find better semi-vectorized representation). Any tips here appreciated. Jan
