Re: [julia-users] Re: Parallel sparse matrix vector multiplication

Andreas Noack Jensen Mon, 10 Feb 2014 14:09:49 -0800

You are welcome. Good to hear that it worked.


2014-02-10 22:35 GMT+01:00 Madeleine Udell <[email protected]>:

> fantastic, thank you. I now see Base.libblas_name="libmkl_rt", and am able
> to run the following test successfully:
>
> transa = 'N'::Base.LinAlg.BlasChar # multiply by A, not A'
> matdescra = "GXXF" # G = general, X = ignored, F = one-based indexing
> m,n = 50,100
> A = sprand(m,n,.01)
> y = zeros(m)
> x = rand(n)
> alpha = 1.0
> beta = 1.0
>
> Base.LinAlg.SparseBLAS.cscmv!(transa,alpha,matdescra,A,x,beta,y)
> y_builtin = A*x
>
> julia> y==y_builtin
> true
>
>
> On Mon, Feb 10, 2014 at 12:08 PM, Andreas Noack Jensen <
> [email protected]> wrote:
>
>> Hi Madeleine
>>
>> When compiling julia with MKL you'll have to do make cleanall as well as
>> rebuild ARPACK and Suitesparse, i.e. make -C distclean-arpack and make -C
>> distclean-suitesparse. It is also easier to create a Make.user there you
>> set USE_MKL=1 and MKLROOT to the location of your MKL library files, e.g.
>> /opt/intel/mkl. The arguments are explained here
>>
>>
>> http://software.intel.com/sites/products/documentation/hpc/mkl/mklman/GUID-C2EE93F0-B573-4538-A994-202CB3ADFFC2.htm#GUID-C2EE93F0-B573-4538-A994-202CB3ADFFC2
>>
>> but transa determines if the operation is Ax or A'x and matdescra has
>> information about the structure of the matrix, e.g. if it is triangular.
>> When you have succeeded in compiling julia with MKL, the libblas variable
>> should just be Base.libblas_name.
>>
>>
>> 2014-02-10 20:37 GMT+01:00 Madeleine Udell <[email protected]>:
>>
>> I'm having trouble using MKL in julia. I changed Make.inc so that USE_MKL
>>> = 1,
>>> but when I make and run julia, I find that Base.libblas_name =
>>> "libopenblas". Is this expected? I would have thought it would be eg
>>> "libmkl_core".
>>>
>>> Andreas, I found your wrappers for MKL in this 
>>> PR<https://github.com/JuliaLang/julia/pull/4525/files>.
>>> I've not used MKL before, so I don't understand the call signature of those
>>> functions in order to call MKL directly. Any chance you could explain what
>>> are transa::BlasChar and matdescra::ASCIIString in the following function,
>>> and which mkl library is expected in the libblas variable? I see many .so
>>> files in the lib/intel64 directory of my mkl installation, and I'm not sure
>>> which one to point julia to.
>>>
>>> function cscmv!(transa::BlasChar, α::$T, matdescra::ASCIIString,
>>> A::SparseMatrixCSC{$T, BlasInt}, x::StridedVector{$T}, β::$T,
>>> y::StridedVector{$T})
>>> length(x) == A.n || throw(DimensionMismatch("Matrix with $(A.n) columns
>>> multiplied with vector of length $(length(x))"))
>>>  length(y) == A.m || throw(DimensionMismatch("Vector of length $(A.m)
>>> added to vector of length $(length(y))")) #
>>> ccall(($(string(mv)), libblas), Void,
>>>  (Ptr{Uint8}, Ptr{BlasInt}, Ptr{BlasInt}, Ptr{$T},
>>>  Ptr{Uint8}, Ptr{$T}, Ptr{BlasInt}, Ptr{BlasInt},
>>>  Ptr{BlasInt}, Ptr{$T}, Ptr{$T}, Ptr{$T}),
>>> &transa, &A.m, &A.n, &α,
>>> matdescra, A.nzval, A.rowval, pointer(A.colptr, 1),
>>>  pointer(A.colptr, 2), x, &β, y)
>>> return y
>>> end
>>>
>>> Thanks for your help!
>>> Madeleine
>>>
>>>
>>> On Wed, Feb 5, 2014 at 1:49 PM, Andreas Noack Jensen <
>>> [email protected]> wrote:
>>>
>>>> A*b will not call MKL when A is sparse. There has been some writing
>>>> about making a MKL package that overwrites A_mul_B(Matrix,Vector) with the
>>>> MKL versions and I actually wrote wrappers for the sparse MKL subroutines
>>>> in the fall for the same reason.
>>>>
>>>>
>>>> 2014-02-05 Madeleine Udell <[email protected]>:
>>>>
>>>> Miles, you're right that writing sparse matrix vector products in
>>>>> native Julia probably won't be the best idea given Julia's model of
>>>>> parallelism. That's why I'm interested in calling an outside library like
>>>>> PETSc.
>>>>>
>>>>> I see it's possible to link Julia with MKL. I haven't tried this yet,
>>>>> but if I do, will A*b (where A is sparse) call MKL to perform the matrix
>>>>> vector product?
>>>>>
>>>>>
>>>>> On Wed, Feb 5, 2014 at 11:43 AM, Miles Lubin <[email protected]>wrote:
>>>>>
>>>>>> Memory access is typically a significant bottleneck in sparse
>>>>>> mat-vec, so unfortunately I'm skeptical that one could achieve good
>>>>>> performance using Julia's current distributed memory approach on a
>>>>>> multicore machine. This really calls for something like OpenMP.
>>>>>>
>>>>>>
>>>>>> On Wednesday, February 5, 2014 11:42:00 AM UTC-5, Madeleine Udell
>>>>>> wrote:
>>>>>>>
>>>>>>> I'm developing an iterative optimization algorithm in Julia along
>>>>>>> the lines of other contributions to the Iterative Solvers 
>>>>>>> project<https://github.com/JuliaLang/IterativeSolvers.jl>or Krylov
>>>>>>> Subspace
>>>>>>> <https://github.com/JuliaLang/IterativeSolvers.jl/blob/master/src/krylov.jl>module
>>>>>>>  whose
>>>>>>> only computationally intensive step is computing A*b or A'*b. I would 
>>>>>>> like
>>>>>>> to parallelize the method by using a parallel sparse matrix vector
>>>>>>> multiply. Is there a standard backend matrix-vector multiply that's
>>>>>>> recommended in Julia if I'm targeting a shared memory computer with a 
>>>>>>> large
>>>>>>> number of processors? Similarly, is there a recommended backend for
>>>>>>> targeting a cluster? My matrices can easily reach 10 million rows by 1
>>>>>>> million columns, with sparsity anywhere from .01% to problems that are
>>>>>>> nearly diagonal.
>>>>>>>
>>>>>>> I've seen many posts<https://github.com/JuliaLang/julia/issues/2645> 
>>>>>>> talking
>>>>>>> about integrating PETSc as a backend for this purpose, but it looks like
>>>>>>> the 
>>>>>>> project<https://github.com/petsc/petsc/blob/master/bin/julia/PETSc.jl>has
>>>>>>>  stalled - the last commits I see are a year ago. I'm also interested in
>>>>>>> other backends, eg Spark <http://spark.incubator.apache.org/>, 
>>>>>>> SciDB<http://scidb.org/>,
>>>>>>> etc.
>>>>>>>
>>>>>>> I'm more interested in solving sparse problems, but as a side note,
>>>>>>> the built-in BLAS acceleration by changing the number of threads 
>>>>>>> `blas_set_num_threads`
>>>>>>> works ok for dense problems using a moderate number of processors. I 
>>>>>>> wonder
>>>>>>> why the number of threads isn't set higher than one by default, for
>>>>>>> example, using as many as nprocs() cores?
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Madeleine Udell
>>>>> PhD Candidate in Computational and Mathematical Engineering
>>>>> Stanford University
>>>>> www.stanford.edu/~udell
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Med venlig hilsen
>>>>
>>>> Andreas Noack Jensen
>>>>
>>>
>>>
>>>
>>> --
>>> Madeleine Udell
>>> PhD Candidate in Computational and Mathematical Engineering
>>> Stanford University
>>> www.stanford.edu/~udell
>>>
>>
>>
>>
>> --
>> Med venlig hilsen
>>
>> Andreas Noack Jensen
>>
>
>
>
> --
> Madeleine Udell
> PhD Candidate in Computational and Mathematical Engineering
> Stanford University
> www.stanford.edu/~udell
>



-- 
Med venlig hilsen

Andreas Noack Jensen

Re: [julia-users] Re: Parallel sparse matrix vector multiplication

Reply via email to