Re: [julia-users] Re: Parallel sparse matrix vector multiplication

Madeleine Udell Mon, 10 Feb 2014 11:38:58 -0800

I'm having trouble using MKL in julia. I changed Make.inc so that USE_MKL =
1,
but when I make and run julia, I find that Base.libblas_name =
"libopenblas". Is this expected? I would have thought it would be eg
"libmkl_core".


Andreas, I found your wrappers for MKL in this
PR<https://github.com/JuliaLang/julia/pull/4525/files>.
I've not used MKL before, so I don't understand the call signature of those
functions in order to call MKL directly. Any chance you could explain what
are transa::BlasChar and matdescra::ASCIIString in the following function,
and which mkl library is expected in the libblas variable? I see many .so
files in the lib/intel64 directory of my mkl installation, and I'm not sure
which one to point julia to.

function cscmv!(transa::BlasChar, α::$T, matdescra::ASCIIString,
A::SparseMatrixCSC{$T, BlasInt}, x::StridedVector{$T}, β::$T,
y::StridedVector{$T})
length(x) == A.n || throw(DimensionMismatch("Matrix with $(A.n) columns
multiplied with vector of length $(length(x))"))
length(y) == A.m || throw(DimensionMismatch("Vector of length $(A.m) added
to vector of length $(length(y))")) #
ccall(($(string(mv)), libblas), Void,
(Ptr{Uint8}, Ptr{BlasInt}, Ptr{BlasInt}, Ptr{$T},
 Ptr{Uint8}, Ptr{$T}, Ptr{BlasInt}, Ptr{BlasInt},
 Ptr{BlasInt}, Ptr{$T}, Ptr{$T}, Ptr{$T}),
&transa, &A.m, &A.n, &α,
matdescra, A.nzval, A.rowval, pointer(A.colptr, 1),
pointer(A.colptr, 2), x, &β, y)
return y
end

Thanks for your help!
Madeleine


On Wed, Feb 5, 2014 at 1:49 PM, Andreas Noack Jensen <
[email protected]> wrote:

> A*b will not call MKL when A is sparse. There has been some writing about
> making a MKL package that overwrites A_mul_B(Matrix,Vector) with the MKL
> versions and I actually wrote wrappers for the sparse MKL subroutines in
> the fall for the same reason.
>
>
> 2014-02-05 Madeleine Udell <[email protected]>:
>
> Miles, you're right that writing sparse matrix vector products in native
>> Julia probably won't be the best idea given Julia's model of parallelism.
>> That's why I'm interested in calling an outside library like PETSc.
>>
>> I see it's possible to link Julia with MKL. I haven't tried this yet, but
>> if I do, will A*b (where A is sparse) call MKL to perform the matrix vector
>> product?
>>
>>
>> On Wed, Feb 5, 2014 at 11:43 AM, Miles Lubin <[email protected]>wrote:
>>
>>> Memory access is typically a significant bottleneck in sparse mat-vec,
>>> so unfortunately I'm skeptical that one could achieve good performance
>>> using Julia's current distributed memory approach on a multicore machine.
>>> This really calls for something like OpenMP.
>>>
>>>
>>> On Wednesday, February 5, 2014 11:42:00 AM UTC-5, Madeleine Udell wrote:
>>>>
>>>> I'm developing an iterative optimization algorithm in Julia along the
>>>> lines of other contributions to the Iterative Solvers 
>>>> project<https://github.com/JuliaLang/IterativeSolvers.jl>or Krylov
>>>> Subspace
>>>> <https://github.com/JuliaLang/IterativeSolvers.jl/blob/master/src/krylov.jl>module
>>>>  whose
>>>> only computationally intensive step is computing A*b or A'*b. I would like
>>>> to parallelize the method by using a parallel sparse matrix vector
>>>> multiply. Is there a standard backend matrix-vector multiply that's
>>>> recommended in Julia if I'm targeting a shared memory computer with a large
>>>> number of processors? Similarly, is there a recommended backend for
>>>> targeting a cluster? My matrices can easily reach 10 million rows by 1
>>>> million columns, with sparsity anywhere from .01% to problems that are
>>>> nearly diagonal.
>>>>
>>>> I've seen many posts <https://github.com/JuliaLang/julia/issues/2645> 
>>>> talking
>>>> about integrating PETSc as a backend for this purpose, but it looks like
>>>> the 
>>>> project<https://github.com/petsc/petsc/blob/master/bin/julia/PETSc.jl>has 
>>>> stalled - the last commits I see are a year ago. I'm also interested in
>>>> other backends, eg Spark <http://spark.incubator.apache.org/>, 
>>>> SciDB<http://scidb.org/>,
>>>> etc.
>>>>
>>>> I'm more interested in solving sparse problems, but as a side note, the
>>>> built-in BLAS acceleration by changing the number of threads 
>>>> `blas_set_num_threads`
>>>> works ok for dense problems using a moderate number of processors. I wonder
>>>> why the number of threads isn't set higher than one by default, for
>>>> example, using as many as nprocs() cores?
>>>>
>>>
>>
>>
>> --
>> Madeleine Udell
>> PhD Candidate in Computational and Mathematical Engineering
>> Stanford University
>> www.stanford.edu/~udell
>>
>
>
>
> --
> Med venlig hilsen
>
> Andreas Noack Jensen
>



-- 
Madeleine Udell
PhD Candidate in Computational and Mathematical Engineering
Stanford University
www.stanford.edu/~udell

Re: [julia-users] Re: Parallel sparse matrix vector multiplication

Reply via email to