Re: [julia-users] Efficient linear transformation (Ax and AX)

Jameson Nash Mon, 14 Jul 2014 17:59:21 -0700

unless I am completely mistaken, you could also rearrange the terms in the
matrix multiply to do the multiplication completely column-wise for both
(since there's no reason you must compute r[i] all at once, you already
have a perfect location for storing a temporary value if needed: the output
array itself)

however, that would ignores the effect of the cpu's memory caches on this
operation. Pictorially, for the 2x2 A*x case, we can compute using any
combination of the following options:

|A B| * |X|
|C D|   |Y|
=
|A*X| + |B*Y|
|C*X|   |D*Y|
=
|A*X+B*Y|
|C*X+D*Y|
=
  #= or a crazy option: =#
|A*X+B*Y| + |  0|
|    D*Y| + |C*X|

All of these are simple term-rearrangements (unless I made a typo), but are
intended to show various options for the order of operations which achieve
the same answer.

If I have understood correctly, an optimized BLAS actually ends up
"blocking" the matrix and doing the matrix multiply in small chunks (rather
than strictly column-wise or row-wise) to achieve maximal efficiency

These sorts of tricks are why a BLAS algorithm is highly specific to a CPU,
and you end up with a massive openblas shared library when compiled with
dynamic architecture support (the Julia default) as compared to without
that support.

Looking at the technical side, to figure out the optimal combination, you
have to understand the impact of very low-level hardware details, such as
the size of your cpu's cache-line (I think is usually on the order of
magnitude of 8 bytes). When you computer requests a byte of memory, what it
actually gets is an entire cache line. Accessing the data in the right
order is faster simply because it uses all of the data in that cache line.

On Mon, Jul 14, 2014 at 6:56 PM, John Myles White <[email protected]>
wrote:

> Basic linear algebra is done by BLAS and LAPACK, so I don't think these
> issues cause performance problems.
>
>  -- John
>
> On Jul 14, 2014, at 6:55 PM, Andrei Zh <[email protected]> wrote:
>
> > Linear transformation is super-frequent operation in scientific
> computations, so I'd like to know how to perform it in a most efficient
> way. Often linear transformation is written as:
> >
> >     r = A * x
> >
> > where A is transformation matrix and x is a data vector. I don't know
> how exactly matrix-by-vector multiplication is implemented in Julia, but I
> believe for vectors it reduces to something like this (please correct me if
> this it algorithmically wrong):
> >
> >     r = zeros(size(x))
> >     for i=1:length(x)
> >        r[i] = dot(A[i, :], x)
> >     end
> >
> > Important part here is "A[i, :]", that is, transformation matrix is
> accessed row by row. Since matrices in Julia are stored column-wise,
> row-wise access is pretty inefficient.
> >
> > I thought about a trick with multiplication order. Instead of
> calculating "A * x" I can store transformed version of A, say, B = A', and
> compute transformation as follows:
> >
> >     r = (x' * B)'
> >
> > B here is accessed column by column, so it should work well.
> >
> > But what if instead of single vector "x" we have data matrix "X"? In
> this case both - "A * X" and "X * A" are inefficient, since one of these
> matrices will be accessed row by row. In NumPy I would just set C order
> (row-wise) of elements to the first matrix and Fortran order (column-wise)
> for the second one, but AFAIK there's no such option in Julia (is it?). So
> what's the most efficient way to perform linear transformation defined by
> matrix A on data matrix X?
>
>

Re: [julia-users] Efficient linear transformation (A*x and A*X)

Reply via email to

Re: [julia-users] Efficient linear transformation (Ax and AX)