Re: [julia-users] Re: Parallel sparse matrix vector multiplication

Jake Bolewski Tue, 11 Feb 2014 08:40:49 -0800

Hey Madeleine,

First I would check that your global environment 
variables<http://software.intel.com/sites/products/documentation/hpc/mkl/mkl_userguide_lnx/GUID-0DE6A77B-00E0-4ED6-9CAE-52FCF49E5623.htm>are
 set up correctly.  If you want to set up the number of threads 
programmatically:


   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "help()" to list help topics
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.3.0-prerelease+1470 (2014-02-08 16:23 
UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit 596d5c4* (3 days old master)
|__/                   |  x86_64-unknown-linux-gnu

julia> Base.blas_vendor()
:mkl

julia> Base.blas_set_num_threads(12)


You can see the relevant code here: 
https://github.com/JuliaLang/julia/blob/8ac5a7f7fff1c54a768c7bc9ae85cf053d310f42/base/util.jl#L293
.
It's always worth a quick search through the code base to figure out what 
is going on behind the scenes.

Hope that helps!
Jake

On Tuesday, February 11, 2014 12:34:56 AM UTC-5, Madeleine Udell wrote:
>
> Jake, thanks for the 
> reference<http://software.intel.com/en-us/forums/topic/294954>; 
> I have 32 hyperthreaded cores, so there's definitely something else going 
> on to limit me to 6 in addition to not exploiting hyperthreading.
>
 

> Perhaps I need to call something like omp_set_num_threads()? But there 
> doesn't seem to be a function by this name in the libmkl_rt library.
>
> julia> 
> ccall((:omp_set_num_threads,Base.libblas_name),Ptr{Void},(Uint8,),32)
> ERROR: ccall: could not find function omp_set_num_threads in library 
> libmkl_rt
>  in anonymous at no file
>
>
> On Mon, Feb 10, 2014 at 4:05 PM, Jake Bolewski 
> <[email protected]<javascript:>
> > wrote:
>
>> Are those hyper-threaded "cores"?  If so, check Intel MKL's documentation 
>> on hyper-threading.
>>
>> -Best
>> Jake 
>>
>> On Monday, February 10, 2014 6:38:50 PM UTC-5, Madeleine Udell wrote:
>>
>>> It looks like only 6 threads are being used when I use mkl from julia. 
>>> If I do blas_set_num_threads(n), then using top, I see julia is 
>>> running at min(n,6)*100% cpu. Any idea why this would be, or how to 
>>> get mkl to use more threads? I'm not sure if this is an issue in julia 
>>> or with my installation of mkl. 
>>>
>>> On Mon, Feb 10, 2014 at 2:09 PM, Andreas Noack Jensen 
>>> <[email protected]> wrote: 
>>> > You are welcome. Good to hear that it worked. 
>>> > 
>>> > 
>>> > 2014-02-10 22:35 GMT+01:00 Madeleine Udell <[email protected]>: 
>>> > 
>>> >> fantastic, thank you. I now see Base.libblas_name="libmkl_rt", and am 
>>> able 
>>> >> to run the following test successfully: 
>>> >> 
>>> >> transa = 'N'::Base.LinAlg.BlasChar # multiply by A, not A' 
>>> >> matdescra = "GXXF" # G = general, X = ignored, F = one-based indexing 
>>> >> m,n = 50,100 
>>> >> A = sprand(m,n,.01) 
>>> >> y = zeros(m) 
>>> >> x = rand(n) 
>>> >> alpha = 1.0 
>>> >> beta = 1.0 
>>> >> 
>>> >> Base.LinAlg.SparseBLAS.cscmv!(transa,alpha,matdescra,A,x,beta,y) 
>>> >> y_builtin = A*x 
>>> >> 
>>> >> julia> y==y_builtin 
>>> >> true 
>>> >> 
>>> >> 
>>> >> On Mon, Feb 10, 2014 at 12:08 PM, Andreas Noack Jensen 
>>> >> <[email protected]> wrote: 
>>> >>> 
>>> >>> Hi Madeleine 
>>> >>> 
>>> >>> When compiling julia with MKL you'll have to do make cleanall as 
>>> well as 
>>> >>> rebuild ARPACK and Suitesparse, i.e. make -C distclean-arpack and 
>>> make -C 
>>> >>> distclean-suitesparse. It is also easier to create a Make.user there 
>>> you set 
>>> >>> USE_MKL=1 and MKLROOT to the location of your MKL library files, 
>>> e.g. 
>>> >>> /opt/intel/mkl. The arguments are explained here 
>>> >>> 
>>> >>> 
>>> >>> http://software.intel.com/sites/products/documentation/
>>> hpc/mkl/mklman/GUID-C2EE93F0-B573-4538-A994-202CB3ADFFC2.
>>> htm#GUID-C2EE93F0-B573-4538-A994-202CB3ADFFC2 
>>> >>> 
>>> >>> but transa determines if the operation is Ax or A'x and matdescra 
>>> has 
>>> >>> information about the structure of the matrix, e.g. if it is 
>>> triangular. 
>>> >>> When you have succeeded in compiling julia with MKL, the libblas 
>>> variable 
>>> >>> should just be Base.libblas_name. 
>>> >>> 
>>> >>> 
>>> >>> 2014-02-10 20:37 GMT+01:00 Madeleine Udell <[email protected]>: 
>>> >>> 
>>> >>>> I'm having trouble using MKL in julia. I changed Make.inc so that 
>>> >>>> USE_MKL = 1, 
>>> >>>> but when I make and run julia, I find that Base.libblas_name = 
>>> >>>> "libopenblas". Is this expected? I would have thought it would be 
>>> eg 
>>> >>>> "libmkl_core". 
>>> >>>> 
>>> >>>> Andreas, I found your wrappers for MKL in this PR. I've not used 
>>> MKL 
>>> >>>> before, so I don't understand the call signature of those functions 
>>> in order 
>>> >>>> to call MKL directly. Any chance you could explain what are 
>>> transa::BlasChar 
>>> >>>> and matdescra::ASCIIString in the following function, and which mkl 
>>> library 
>>> >>>> is expected in the libblas variable? I see many .so files in the 
>>> lib/intel64 
>>> >>>> directory of my mkl installation, and I'm not sure which one to 
>>> point julia 
>>> >>>> to. 
>>> >>>> 
>>> >>>> function cscmv!(transa::BlasChar, α::$T, matdescra::ASCIIString, 
>>> >>>> A::SparseMatrixCSC{$T, BlasInt}, x::StridedVector{$T}, β::$T, 
>>> >>>> y::StridedVector{$T}) 
>>> >>>> length(x) == A.n || throw(DimensionMismatch("Matrix with $(A.n) 
>>> columns 
>>> >>>> multiplied with vector of length $(length(x))")) 
>>> >>>> length(y) == A.m || throw(DimensionMismatch("Vector of length 
>>> $(A.m) 
>>> >>>> added to vector of length $(length(y))")) # 
>>> >>>> ccall(($(string(mv)), libblas), Void, 
>>> >>>> (Ptr{Uint8}, Ptr{BlasInt}, Ptr{BlasInt}, Ptr{$T}, 
>>> >>>> Ptr{Uint8}, Ptr{$T}, Ptr{BlasInt}, Ptr{BlasInt}, 
>>> >>>> Ptr{BlasInt}, Ptr{$T}, Ptr{$T}, Ptr{$T}), 
>>> >>>> &transa, &A.m, &A.n, &α, 
>>> >>>> matdescra, A.nzval, A.rowval, pointer(A.colptr, 1), 
>>> >>>> pointer(A.colptr, 2), x, &β, y) 
>>> >>>> return y 
>>> >>>> end 
>>> >>>> 
>>> >>>> Thanks for your help! 
>>> >>>> Madeleine 
>>> >>>> 
>>> >>>> 
>>> >>>> On Wed, Feb 5, 2014 at 1:49 PM, Andreas Noack Jensen 
>>> >>>> <[email protected]> wrote: 
>>> >>>>> 
>>> >>>>> A*b will not call MKL when A is sparse. There has been some 
>>> writing 
>>> >>>>> about making a MKL package that overwrites A_mul_B(Matrix,Vector) 
>>> with the 
>>> >>>>> MKL versions and I actually wrote wrappers for the sparse MKL 
>>> subroutines in 
>>> >>>>> the fall for the same reason. 
>>> >>>>> 
>>> >>>>> 
>>> >>>>> 2014-02-05 Madeleine Udell <[email protected]>: 
>>> >>>>> 
>>> >>>>>> Miles, you're right that writing sparse matrix vector products in 
>>> >>>>>> native Julia probably won't be the best idea given Julia's model 
>>> of 
>>> >>>>>> parallelism. That's why I'm interested in calling an outside 
>>> library like 
>>> >>>>>> PETSc. 
>>> >>>>>> 
>>> >>>>>> I see it's possible to link Julia with MKL. I haven't tried this 
>>> yet, 
>>> >>>>>> but if I do, will A*b (where A is sparse) call MKL to perform the 
>>> matrix 
>>> >>>>>> vector product? 
>>> >>>>>> 
>>> >>>>>> 
>>> >>>>>> On Wed, Feb 5, 2014 at 11:43 AM, Miles Lubin <[email protected]> 
>>>
>>> >>>>>> wrote: 
>>> >>>>>>> 
>>> >>>>>>> Memory access is typically a significant bottleneck in sparse 
>>> >>>>>>> mat-vec, so unfortunately I'm skeptical that one could achieve 
>>> good 
>>> >>>>>>> performance using Julia's current distributed memory approach on 
>>> a multicore 
>>> >>>>>>> machine. This really calls for something like OpenMP. 
>>> >>>>>>> 
>>> >>>>>>> 
>>> >>>>>>> On Wednesday, February 5, 2014 11:42:00 AM UTC-5, Madeleine 
>>> Udell 
>>> >>>>>>> wrote: 
>>> >>>>>>>> 
>>> >>>>>>>> I'm developing an iterative optimization algorithm in Julia 
>>> along 
>>> >>>>>>>> the lines of other contributions to the Iterative Solvers 
>>> project or Krylov 
>>> >>>>>>>> Subspace module whose only computationally intensive step is 
>>> computing A*b 
>>> >>>>>>>> or A'*b. I would like to parallelize the method by using a 
>>> parallel sparse 
>>> >>>>>>>> matrix vector multiply. Is there a standard backend 
>>> matrix-vector multiply 
>>> >>>>>>>> that's recommended in Julia if I'm targeting a shared memory 
>>> computer with a 
>>> >>>>>>>> large number of processors? Similarly, is there a recommended 
>>> backend for 
>>> >>>>>>>> targeting a cluster? My matrices can easily reach 10 million 
>>> rows by 1 
>>> >>>>>>>> million columns, with sparsity anywhere from .01% to problems 
>>> that are 
>>> >>>>>>>> nearly diagonal. 
>>> >>>>>>>> 
>>> >>>>>>>> I've seen many posts talking about integrating PETSc as a 
>>> backend 
>>> >>>>>>>> for this purpose, but it looks like the project has stalled - 
>>> the last 
>>> >>>>>>>> commits I see are a year ago. I'm also interested in other 
>>> backends, eg 
>>> >>>>>>>> Spark, SciDB, etc. 
>>> >>>>>>>> 
>>> >>>>>>>> I'm more interested in solving sparse problems, but as a side 
>>> note, 
>>> >>>>>>>> the built-in BLAS acceleration by changing the number of 
>>> threads 
>>> >>>>>>>> `blas_set_num_threads` works ok for dense problems using a 
>>> moderate number 
>>> >>>>>>>> of processors. I wonder why the number of threads isn't set 
>>> higher than one 
>>> >>>>>>>> by default, for example, using as many as nprocs() cores? 
>>> >>>>>> 
>>> >>>>>> 
>>> >>>>>> 
>>> >>>>>> 
>>> >>>>>> -- 
>>> >>>>>> Madeleine Udell 
>>> >>>>>> PhD Candidate in Computational and Mathematical Engineering 
>>> >>>>>> Stanford University 
>>> >>>>>> www.stanford.edu/~udell 
>>> >>>>> 
>>> >>>>> 
>>> >>>>> 
>>> >>>>> 
>>> >>>>> -- 
>>> >>>>> Med venlig hilsen 
>>> >>>>> 
>>> >>>>> Andreas Noack Jensen 
>>> >>>> 
>>> >>>> 
>>> >>>> 
>>> >>>> 
>>> >>>> -- 
>>> >>>> Madeleine Udell 
>>> >>>> PhD Candidate in Computational and Mathematical Engineering 
>>> >>>> Stanford University 
>>> >>>> www.stanford.edu/~udell 
>>> >>> 
>>> >>> 
>>> >>> 
>>> >>> 
>>> >>> -- 
>>> >>> Med venlig hilsen 
>>> >>> 
>>> >>> Andreas Noack Jensen 
>>> >> 
>>> >> 
>>> >> 
>>> >> 
>>> >> -- 
>>> >> Madeleine Udell 
>>> >> PhD Candidate in Computational and Mathematical Engineering 
>>> >> Stanford University 
>>> >> www.stanford.edu/~udell 
>>> > 
>>> > 
>>> > 
>>> > 
>>> > -- 
>>> > Med venlig hilsen 
>>> > 
>>> > Andreas Noack Jensen 
>>>
>>>
>>>
>>> -- 
>>> Madeleine Udell 
>>> PhD Candidate in Computational and Mathematical Engineering 
>>> Stanford University 
>>> www.stanford.edu/~udell 
>>>
>>
>
>
> -- 
> Madeleine Udell
> PhD Candidate in Computational and Mathematical Engineering
> Stanford University
> www.stanford.edu/~udell
>

Re: [julia-users] Re: Parallel sparse matrix vector multiplication

Reply via email to