I missed that you were already using the ArrayViews package. Sorry for
that.

The way you have written the rank one update in line 14 is very expensive.
Unfortunately, we are not making all of BLAS available through our generic
mat-vec interface right now, but the whole of BLAS is wrapped in the BLAS
module, so you can use the ger! function there, i.e.

    function mgs2(X)
    # mgs speed project

        nobs, nvars = size(X)
        R = eye(nvars)

        for l=1:nvars-1
            v_i = view(X,:,l+1:nvars) # --- line 10
            s = view(X,:,l);

            R[l,l+1:nvars] = (v_i'*s)/sumabs2(s) # --- line 13
            BLAS.ger!(-1.0, s, vec(view(R,l,l+1:nvars)), v_i) # --- line 14
        end

        X, R

    end

which avoids a lot of allocation. I don't know how Octave handles such
expression, so I don't know why they are faster when not calling into BLAS
directly.

Med venlig hilsen

Andreas Noack

2014-10-23 10:02 GMT-04:00 Tim Holy <[email protected]>:

> I believe I mentioned A_mul_B! and friends. For the subtraction, check out
> NumericExtensions, specifically the subtract! function.
>
> --Tim
>
> On Thursday, October 23, 2014 06:43:39 AM Ján Dolinský wrote:
> > > I'd like to approach this speed at the end.
> > >
> > >
> > > I don't think it is possible in Julia right now without using dirty
> tricks
> > > such as passing pointers around. You'd like to get the speed from BLAS
> by
> > > operating on panels of your matrix, but you'd like to avoid the copying
> > > and
> > > reallocation of arrays. If you devectorised your code completely,
> there'd
> > > be no allocation, but you wouldn't have the fast multiplications in
> BLAS.
> > > You can get some of the way by using one of the ArrayViews packages,
> but
> > > the garbage collector will charge a fee for each view you produce so
> > > LAPACK
> > > speed is not attainable.
> > >
> > > Med venlig hilsen
> > >
> > > Andreas Noack
> >
> > Indeed, achieving BLAS speed is rather difficult for good reasons you
> > mentioned above. Nevertheless, I'd like to match (or beat :)) the speed
> of
> > my MGS routine written in Octave. It takes about 4 seconds in Octave and
> > about 6.6 seconds in Julia. In Octave I could write it in a very straight
> > forward fashion (Octave code available in the very first post of this
> > thread).
> >
> > Apparently most of the computing time of my Julia MGS routine is consumed
> > by line #14. I tried to comment out the line 14. Computing time (and
> memory
> > allocation too) then dropped sharply from 6.6s to 0.37s.
> >
> > This indicates that line 13 is fast already in its vectorized form. This
> is
> > very nice since it is a very compact piece of code.
> >
> > I'll try to devectorize line 14 (or try to find better semi-vectorized
> > representation). Any tips here appreciated.
> >
> > Jan
>
>

Reply via email to