Re: [julia-users] Re: Parallel sparse matrix vector multiplication

Madeleine Udell Mon, 10 Feb 2014 21:35:35 -0800

Jake, thanks for the
reference<http://software.intel.com/en-us/forums/topic/294954>;
I have 32 hyperthreaded cores, so there's definitely something else going
on to limit me to 6 in addition to not exploiting hyperthreading.


Perhaps I need to call something like omp_set_num_threads()? But there
doesn't seem to be a function by this name in the libmkl_rt library.

julia> ccall((:omp_set_num_threads,Base.libblas_name),Ptr{Void},(Uint8,),32)
ERROR: ccall: could not find function omp_set_num_threads in library
libmkl_rt
 in anonymous at no file


On Mon, Feb 10, 2014 at 4:05 PM, Jake Bolewski <[email protected]>wrote:

> Are those hyper-threaded "cores"?  If so, check Intel MKL's documentation
> on hyper-threading.
>
> -Best
> Jake
>
> On Monday, February 10, 2014 6:38:50 PM UTC-5, Madeleine Udell wrote:
>
>> It looks like only 6 threads are being used when I use mkl from julia.
>> If I do blas_set_num_threads(n), then using top, I see julia is
>> running at min(n,6)*100% cpu. Any idea why this would be, or how to
>> get mkl to use more threads? I'm not sure if this is an issue in julia
>> or with my installation of mkl.
>>
>> On Mon, Feb 10, 2014 at 2:09 PM, Andreas Noack Jensen
>> <[email protected]> wrote:
>> > You are welcome. Good to hear that it worked.
>> >
>> >
>> > 2014-02-10 22:35 GMT+01:00 Madeleine Udell <[email protected]>:
>> >
>> >> fantastic, thank you. I now see Base.libblas_name="libmkl_rt", and am
>> able
>> >> to run the following test successfully:
>> >>
>> >> transa = 'N'::Base.LinAlg.BlasChar # multiply by A, not A'
>> >> matdescra = "GXXF" # G = general, X = ignored, F = one-based indexing
>> >> m,n = 50,100
>> >> A = sprand(m,n,.01)
>> >> y = zeros(m)
>> >> x = rand(n)
>> >> alpha = 1.0
>> >> beta = 1.0
>> >>
>> >> Base.LinAlg.SparseBLAS.cscmv!(transa,alpha,matdescra,A,x,beta,y)
>> >> y_builtin = A*x
>> >>
>> >> julia> y==y_builtin
>> >> true
>> >>
>> >>
>> >> On Mon, Feb 10, 2014 at 12:08 PM, Andreas Noack Jensen
>> >> <[email protected]> wrote:
>> >>>
>> >>> Hi Madeleine
>> >>>
>> >>> When compiling julia with MKL you'll have to do make cleanall as well
>> as
>> >>> rebuild ARPACK and Suitesparse, i.e. make -C distclean-arpack and
>> make -C
>> >>> distclean-suitesparse. It is also easier to create a Make.user there
>> you set
>> >>> USE_MKL=1 and MKLROOT to the location of your MKL library files, e.g.
>> >>> /opt/intel/mkl. The arguments are explained here
>> >>>
>> >>>
>> >>> http://software.intel.com/sites/products/documentation/
>> hpc/mkl/mklman/GUID-C2EE93F0-B573-4538-A994-202CB3ADFFC2.
>> htm#GUID-C2EE93F0-B573-4538-A994-202CB3ADFFC2
>> >>>
>> >>> but transa determines if the operation is Ax or A'x and matdescra has
>> >>> information about the structure of the matrix, e.g. if it is
>> triangular.
>> >>> When you have succeeded in compiling julia with MKL, the libblas
>> variable
>> >>> should just be Base.libblas_name.
>> >>>
>> >>>
>> >>> 2014-02-10 20:37 GMT+01:00 Madeleine Udell <[email protected]>:
>> >>>
>> >>>> I'm having trouble using MKL in julia. I changed Make.inc so that
>> >>>> USE_MKL = 1,
>> >>>> but when I make and run julia, I find that Base.libblas_name =
>> >>>> "libopenblas". Is this expected? I would have thought it would be eg
>> >>>> "libmkl_core".
>> >>>>
>> >>>> Andreas, I found your wrappers for MKL in this PR. I've not used MKL
>> >>>> before, so I don't understand the call signature of those functions
>> in order
>> >>>> to call MKL directly. Any chance you could explain what are
>> transa::BlasChar
>> >>>> and matdescra::ASCIIString in the following function, and which mkl
>> library
>> >>>> is expected in the libblas variable? I see many .so files in the
>> lib/intel64
>> >>>> directory of my mkl installation, and I'm not sure which one to
>> point julia
>> >>>> to.
>> >>>>
>> >>>> function cscmv!(transa::BlasChar, α::$T, matdescra::ASCIIString,
>> >>>> A::SparseMatrixCSC{$T, BlasInt}, x::StridedVector{$T}, β::$T,
>> >>>> y::StridedVector{$T})
>> >>>> length(x) == A.n || throw(DimensionMismatch("Matrix with $(A.n)
>> columns
>> >>>> multiplied with vector of length $(length(x))"))
>> >>>> length(y) == A.m || throw(DimensionMismatch("Vector of length
>> $(A.m)
>> >>>> added to vector of length $(length(y))")) #
>> >>>> ccall(($(string(mv)), libblas), Void,
>> >>>> (Ptr{Uint8}, Ptr{BlasInt}, Ptr{BlasInt}, Ptr{$T},
>> >>>> Ptr{Uint8}, Ptr{$T}, Ptr{BlasInt}, Ptr{BlasInt},
>> >>>> Ptr{BlasInt}, Ptr{$T}, Ptr{$T}, Ptr{$T}),
>> >>>> &transa, &A.m, &A.n, &α,
>> >>>> matdescra, A.nzval, A.rowval, pointer(A.colptr, 1),
>> >>>> pointer(A.colptr, 2), x, &β, y)
>> >>>> return y
>> >>>> end
>> >>>>
>> >>>> Thanks for your help!
>> >>>> Madeleine
>> >>>>
>> >>>>
>> >>>> On Wed, Feb 5, 2014 at 1:49 PM, Andreas Noack Jensen
>> >>>> <[email protected]> wrote:
>> >>>>>
>> >>>>> A*b will not call MKL when A is sparse. There has been some writing
>> >>>>> about making a MKL package that overwrites A_mul_B(Matrix,Vector)
>> with the
>> >>>>> MKL versions and I actually wrote wrappers for the sparse MKL
>> subroutines in
>> >>>>> the fall for the same reason.
>> >>>>>
>> >>>>>
>> >>>>> 2014-02-05 Madeleine Udell <[email protected]>:
>> >>>>>
>> >>>>>> Miles, you're right that writing sparse matrix vector products in
>> >>>>>> native Julia probably won't be the best idea given Julia's model
>> of
>> >>>>>> parallelism. That's why I'm interested in calling an outside
>> library like
>> >>>>>> PETSc.
>> >>>>>>
>> >>>>>> I see it's possible to link Julia with MKL. I haven't tried this
>> yet,
>> >>>>>> but if I do, will A*b (where A is sparse) call MKL to perform the
>> matrix
>> >>>>>> vector product?
>> >>>>>>
>> >>>>>>
>> >>>>>> On Wed, Feb 5, 2014 at 11:43 AM, Miles Lubin <[email protected]>
>> >>>>>> wrote:
>> >>>>>>>
>> >>>>>>> Memory access is typically a significant bottleneck in sparse
>> >>>>>>> mat-vec, so unfortunately I'm skeptical that one could achieve
>> good
>> >>>>>>> performance using Julia's current distributed memory approach on
>> a multicore
>> >>>>>>> machine. This really calls for something like OpenMP.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> On Wednesday, February 5, 2014 11:42:00 AM UTC-5, Madeleine Udell
>> >>>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>> I'm developing an iterative optimization algorithm in Julia
>> along
>> >>>>>>>> the lines of other contributions to the Iterative Solvers
>> project or Krylov
>> >>>>>>>> Subspace module whose only computationally intensive step is
>> computing A*b
>> >>>>>>>> or A'*b. I would like to parallelize the method by using a
>> parallel sparse
>> >>>>>>>> matrix vector multiply. Is there a standard backend
>> matrix-vector multiply
>> >>>>>>>> that's recommended in Julia if I'm targeting a shared memory
>> computer with a
>> >>>>>>>> large number of processors? Similarly, is there a recommended
>> backend for
>> >>>>>>>> targeting a cluster? My matrices can easily reach 10 million
>> rows by 1
>> >>>>>>>> million columns, with sparsity anywhere from .01% to problems
>> that are
>> >>>>>>>> nearly diagonal.
>> >>>>>>>>
>> >>>>>>>> I've seen many posts talking about integrating PETSc as a
>> backend
>> >>>>>>>> for this purpose, but it looks like the project has stalled -
>> the last
>> >>>>>>>> commits I see are a year ago. I'm also interested in other
>> backends, eg
>> >>>>>>>> Spark, SciDB, etc.
>> >>>>>>>>
>> >>>>>>>> I'm more interested in solving sparse problems, but as a side
>> note,
>> >>>>>>>> the built-in BLAS acceleration by changing the number of threads
>> >>>>>>>> `blas_set_num_threads` works ok for dense problems using a
>> moderate number
>> >>>>>>>> of processors. I wonder why the number of threads isn't set
>> higher than one
>> >>>>>>>> by default, for example, using as many as nprocs() cores?
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> --
>> >>>>>> Madeleine Udell
>> >>>>>> PhD Candidate in Computational and Mathematical Engineering
>> >>>>>> Stanford University
>> >>>>>> www.stanford.edu/~udell
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>> Med venlig hilsen
>> >>>>>
>> >>>>> Andreas Noack Jensen
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Madeleine Udell
>> >>>> PhD Candidate in Computational and Mathematical Engineering
>> >>>> Stanford University
>> >>>> www.stanford.edu/~udell
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Med venlig hilsen
>> >>>
>> >>> Andreas Noack Jensen
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Madeleine Udell
>> >> PhD Candidate in Computational and Mathematical Engineering
>> >> Stanford University
>> >> www.stanford.edu/~udell
>> >
>> >
>> >
>> >
>> > --
>> > Med venlig hilsen
>> >
>> > Andreas Noack Jensen
>>
>>
>>
>> --
>> Madeleine Udell
>> PhD Candidate in Computational and Mathematical Engineering
>> Stanford University
>> www.stanford.edu/~udell
>>
>


-- 
Madeleine Udell
PhD Candidate in Computational and Mathematical Engineering
Stanford University
www.stanford.edu/~udell

Re: [julia-users] Re: Parallel sparse matrix vector multiplication

Reply via email to