Re: [julia-users] Re: Parallel sparse matrix vector multiplication

Jake Bolewski Mon, 10 Feb 2014 16:06:45 -0800

Are those hyper-threaded "cores"?  If so, check Intel MKL's documentation 
on hyper-threading.


-Best
Jake 

On Monday, February 10, 2014 6:38:50 PM UTC-5, Madeleine Udell wrote:
>
> It looks like only 6 threads are being used when I use mkl from julia. 
> If I do blas_set_num_threads(n), then using top, I see julia is 
> running at min(n,6)*100% cpu. Any idea why this would be, or how to 
> get mkl to use more threads? I'm not sure if this is an issue in julia 
> or with my installation of mkl. 
>
> On Mon, Feb 10, 2014 at 2:09 PM, Andreas Noack Jensen 
> <[email protected] <javascript:>> wrote: 
> > You are welcome. Good to hear that it worked. 
> > 
> > 
> > 2014-02-10 22:35 GMT+01:00 Madeleine Udell 
> > <[email protected]<javascript:>>: 
>
> > 
> >> fantastic, thank you. I now see Base.libblas_name="libmkl_rt", and am 
> able 
> >> to run the following test successfully: 
> >> 
> >> transa = 'N'::Base.LinAlg.BlasChar # multiply by A, not A' 
> >> matdescra = "GXXF" # G = general, X = ignored, F = one-based indexing 
> >> m,n = 50,100 
> >> A = sprand(m,n,.01) 
> >> y = zeros(m) 
> >> x = rand(n) 
> >> alpha = 1.0 
> >> beta = 1.0 
> >> 
> >> Base.LinAlg.SparseBLAS.cscmv!(transa,alpha,matdescra,A,x,beta,y) 
> >> y_builtin = A*x 
> >> 
> >> julia> y==y_builtin 
> >> true 
> >> 
> >> 
> >> On Mon, Feb 10, 2014 at 12:08 PM, Andreas Noack Jensen 
> >> <[email protected] <javascript:>> wrote: 
> >>> 
> >>> Hi Madeleine 
> >>> 
> >>> When compiling julia with MKL you'll have to do make cleanall as well 
> as 
> >>> rebuild ARPACK and Suitesparse, i.e. make -C distclean-arpack and make 
> -C 
> >>> distclean-suitesparse. It is also easier to create a Make.user there 
> you set 
> >>> USE_MKL=1 and MKLROOT to the location of your MKL library files, e.g. 
> >>> /opt/intel/mkl. The arguments are explained here 
> >>> 
> >>> 
> >>> 
> http://software.intel.com/sites/products/documentation/hpc/mkl/mklman/GUID-C2EE93F0-B573-4538-A994-202CB3ADFFC2.htm#GUID-C2EE93F0-B573-4538-A994-202CB3ADFFC2
>  
> >>> 
> >>> but transa determines if the operation is Ax or A'x and matdescra has 
> >>> information about the structure of the matrix, e.g. if it is 
> triangular. 
> >>> When you have succeeded in compiling julia with MKL, the libblas 
> variable 
> >>> should just be Base.libblas_name. 
> >>> 
> >>> 
> >>> 2014-02-10 20:37 GMT+01:00 Madeleine Udell 
> >>> <[email protected]<javascript:>>: 
>
> >>> 
> >>>> I'm having trouble using MKL in julia. I changed Make.inc so that 
> >>>> USE_MKL = 1, 
> >>>> but when I make and run julia, I find that Base.libblas_name = 
> >>>> "libopenblas". Is this expected? I would have thought it would be eg 
> >>>> "libmkl_core". 
> >>>> 
> >>>> Andreas, I found your wrappers for MKL in this PR. I've not used MKL 
> >>>> before, so I don't understand the call signature of those functions 
> in order 
> >>>> to call MKL directly. Any chance you could explain what are 
> transa::BlasChar 
> >>>> and matdescra::ASCIIString in the following function, and which mkl 
> library 
> >>>> is expected in the libblas variable? I see many .so files in the 
> lib/intel64 
> >>>> directory of my mkl installation, and I'm not sure which one to point 
> julia 
> >>>> to. 
> >>>> 
> >>>> function cscmv!(transa::BlasChar, α::$T, matdescra::ASCIIString, 
> >>>> A::SparseMatrixCSC{$T, BlasInt}, x::StridedVector{$T}, β::$T, 
> >>>> y::StridedVector{$T}) 
> >>>> length(x) == A.n || throw(DimensionMismatch("Matrix with $(A.n) 
> columns 
> >>>> multiplied with vector of length $(length(x))")) 
> >>>> length(y) == A.m || throw(DimensionMismatch("Vector of length $(A.m) 
> >>>> added to vector of length $(length(y))")) # 
> >>>> ccall(($(string(mv)), libblas), Void, 
> >>>> (Ptr{Uint8}, Ptr{BlasInt}, Ptr{BlasInt}, Ptr{$T}, 
> >>>> Ptr{Uint8}, Ptr{$T}, Ptr{BlasInt}, Ptr{BlasInt}, 
> >>>> Ptr{BlasInt}, Ptr{$T}, Ptr{$T}, Ptr{$T}), 
> >>>> &transa, &A.m, &A.n, &α, 
> >>>> matdescra, A.nzval, A.rowval, pointer(A.colptr, 1), 
> >>>> pointer(A.colptr, 2), x, &β, y) 
> >>>> return y 
> >>>> end 
> >>>> 
> >>>> Thanks for your help! 
> >>>> Madeleine 
> >>>> 
> >>>> 
> >>>> On Wed, Feb 5, 2014 at 1:49 PM, Andreas Noack Jensen 
> >>>> <[email protected] <javascript:>> wrote: 
> >>>>> 
> >>>>> A*b will not call MKL when A is sparse. There has been some writing 
> >>>>> about making a MKL package that overwrites A_mul_B(Matrix,Vector) 
> with the 
> >>>>> MKL versions and I actually wrote wrappers for the sparse MKL 
> subroutines in 
> >>>>> the fall for the same reason. 
> >>>>> 
> >>>>> 
> >>>>> 2014-02-05 Madeleine Udell <[email protected] <javascript:>>: 
> >>>>> 
> >>>>>> Miles, you're right that writing sparse matrix vector products in 
> >>>>>> native Julia probably won't be the best idea given Julia's model of 
> >>>>>> parallelism. That's why I'm interested in calling an outside 
> library like 
> >>>>>> PETSc. 
> >>>>>> 
> >>>>>> I see it's possible to link Julia with MKL. I haven't tried this 
> yet, 
> >>>>>> but if I do, will A*b (where A is sparse) call MKL to perform the 
> matrix 
> >>>>>> vector product? 
> >>>>>> 
> >>>>>> 
> >>>>>> On Wed, Feb 5, 2014 at 11:43 AM, Miles Lubin 
> >>>>>> <[email protected]<javascript:>> 
>
> >>>>>> wrote: 
> >>>>>>> 
> >>>>>>> Memory access is typically a significant bottleneck in sparse 
> >>>>>>> mat-vec, so unfortunately I'm skeptical that one could achieve 
> good 
> >>>>>>> performance using Julia's current distributed memory approach on a 
> multicore 
> >>>>>>> machine. This really calls for something like OpenMP. 
> >>>>>>> 
> >>>>>>> 
> >>>>>>> On Wednesday, February 5, 2014 11:42:00 AM UTC-5, Madeleine Udell 
> >>>>>>> wrote: 
> >>>>>>>> 
> >>>>>>>> I'm developing an iterative optimization algorithm in Julia along 
> >>>>>>>> the lines of other contributions to the Iterative Solvers project 
> or Krylov 
> >>>>>>>> Subspace module whose only computationally intensive step is 
> computing A*b 
> >>>>>>>> or A'*b. I would like to parallelize the method by using a 
> parallel sparse 
> >>>>>>>> matrix vector multiply. Is there a standard backend matrix-vector 
> multiply 
> >>>>>>>> that's recommended in Julia if I'm targeting a shared memory 
> computer with a 
> >>>>>>>> large number of processors? Similarly, is there a recommended 
> backend for 
> >>>>>>>> targeting a cluster? My matrices can easily reach 10 million rows 
> by 1 
> >>>>>>>> million columns, with sparsity anywhere from .01% to problems 
> that are 
> >>>>>>>> nearly diagonal. 
> >>>>>>>> 
> >>>>>>>> I've seen many posts talking about integrating PETSc as a backend 
> >>>>>>>> for this purpose, but it looks like the project has stalled - the 
> last 
> >>>>>>>> commits I see are a year ago. I'm also interested in other 
> backends, eg 
> >>>>>>>> Spark, SciDB, etc. 
> >>>>>>>> 
> >>>>>>>> I'm more interested in solving sparse problems, but as a side 
> note, 
> >>>>>>>> the built-in BLAS acceleration by changing the number of threads 
> >>>>>>>> `blas_set_num_threads` works ok for dense problems using a 
> moderate number 
> >>>>>>>> of processors. I wonder why the number of threads isn't set 
> higher than one 
> >>>>>>>> by default, for example, using as many as nprocs() cores? 
> >>>>>> 
> >>>>>> 
> >>>>>> 
> >>>>>> 
> >>>>>> -- 
> >>>>>> Madeleine Udell 
> >>>>>> PhD Candidate in Computational and Mathematical Engineering 
> >>>>>> Stanford University 
> >>>>>> www.stanford.edu/~udell 
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> 
> >>>>> -- 
> >>>>> Med venlig hilsen 
> >>>>> 
> >>>>> Andreas Noack Jensen 
> >>>> 
> >>>> 
> >>>> 
> >>>> 
> >>>> -- 
> >>>> Madeleine Udell 
> >>>> PhD Candidate in Computational and Mathematical Engineering 
> >>>> Stanford University 
> >>>> www.stanford.edu/~udell 
> >>> 
> >>> 
> >>> 
> >>> 
> >>> -- 
> >>> Med venlig hilsen 
> >>> 
> >>> Andreas Noack Jensen 
> >> 
> >> 
> >> 
> >> 
> >> -- 
> >> Madeleine Udell 
> >> PhD Candidate in Computational and Mathematical Engineering 
> >> Stanford University 
> >> www.stanford.edu/~udell 
> > 
> > 
> > 
> > 
> > -- 
> > Med venlig hilsen 
> > 
> > Andreas Noack Jensen 
>
>
>
> -- 
> Madeleine Udell 
> PhD Candidate in Computational and Mathematical Engineering 
> Stanford University 
> www.stanford.edu/~udell 
>

Re: [julia-users] Re: Parallel sparse matrix vector multiplication

Reply via email to