Nice!
On Thu, Feb 13, 2014 at 6:49 PM, Madeleine Udell <[email protected]>wrote: > Thanks for all the advice, everyone. I've just finished a parallel sparse > matrix vector multiplication library written in straight julia for shared > memory machines, using Amit Murthy's SharedArrays. You can find it at > https://github.com/madeleineudell/ParallelSparseMatMul.jl > > For a matrix with 1E9 nonzeros, its performance on 6 processors is the > same as MKL, and it retains linear parallel scaling up to the highest I've > tested (40 threads). > > (serial julia) > julia> @time S'*x; > elapsed time: 17.63098703 seconds (8410364 bytes allocated) > > (mkl, 6 threads) > julia> @time > Base.LinAlg.SparseBLAS.cscmv!(transa,alpha,matdescra,A,x,beta,y); > elapsed time: 4.334168257 seconds (48 bytes allocated) > > (ParallelSparseMatMul, 6 threads) > julia> @time y = A'*x; > elapsed time: 4.18557078 seconds (1858156 bytes allocated) > > > > > > On Tue, Feb 11, 2014 at 1:44 PM, Madeleine Udell < > [email protected]> wrote: > >> Thanks, Jake. It seems like I'm still not able to go above 6 threads, but >> that may be deep in my MKL install. >> >> I've begun implementing a shared sparse matrix class in native Julia that >> seems to scale better. The code is attached to this email. How can I >> overload A'*b for my SharedBilinearOperator class, so that I can call >> At_mul_B on a stored copy of A', rather than computing A' each time I want >> to multiply by it? ie can I write something like >> >> ('*)(L::SharedBilinearOperator,x) = At_mul_B(L.A,x) >> >> ? >> >> >> On Tue, Feb 11, 2014 at 8:40 AM, Jake Bolewski <[email protected]>wrote: >> >>> Hey Madeleine, >>> >>> First I would check that your global environment >>> variables<http://software.intel.com/sites/products/documentation/hpc/mkl/mkl_userguide_lnx/GUID-0DE6A77B-00E0-4ED6-9CAE-52FCF49E5623.htm>are >>> set up correctly. If you want to set up the number of threads >>> programmatically: >>> >>> _ _ _(_)_ | A fresh approach to technical computing >>> (_) | (_) (_) | Documentation: http://docs.julialang.org >>> _ _ _| |_ __ _ | Type "help()" to list help topics >>> | | | | | | |/ _` | | >>> | | |_| | | | (_| | | Version 0.3.0-prerelease+1470 (2014-02-08 >>> 16:23 UTC) >>> _/ |\__'_|_|_|\__'_| | Commit 596d5c4* (3 days old master) >>> |__/ | x86_64-unknown-linux-gnu >>> >>> julia> Base.blas_vendor() >>> :mkl >>> >>> julia> Base.blas_set_num_threads(12) >>> >>> >>> You can see the relevant code here: >>> https://github.com/JuliaLang/julia/blob/8ac5a7f7fff1c54a768c7bc9ae85cf053d310f42/base/util.jl#L293 >>> . >>> It's always worth a quick search through the code base to figure out >>> what is going on behind the scenes. >>> >>> Hope that helps! >>> Jake >>> >>> On Tuesday, February 11, 2014 12:34:56 AM UTC-5, Madeleine Udell wrote: >>>> >>>> Jake, thanks for the >>>> reference<http://software.intel.com/en-us/forums/topic/294954>; >>>> I have 32 hyperthreaded cores, so there's definitely something else going >>>> on to limit me to 6 in addition to not exploiting hyperthreading. >>>> >>> >>> >>>> Perhaps I need to call something like omp_set_num_threads()? But there >>>> doesn't seem to be a function by this name in the libmkl_rt library. >>>> >>>> julia> ccall((:omp_set_num_threads,Base.libblas_name),Ptr{Void},( >>>> Uint8,),32) >>>> ERROR: ccall: could not find function omp_set_num_threads in library >>>> libmkl_rt >>>> in anonymous at no file >>>> >>>> >>>> On Mon, Feb 10, 2014 at 4:05 PM, Jake Bolewski <[email protected]>wrote: >>>> >>>>> Are those hyper-threaded "cores"? If so, check Intel MKL's >>>>> documentation on hyper-threading. >>>>> >>>>> -Best >>>>> Jake >>>>> >>>>> On Monday, February 10, 2014 6:38:50 PM UTC-5, Madeleine Udell wrote: >>>>> >>>>>> It looks like only 6 threads are being used when I use mkl from >>>>>> julia. >>>>>> If I do blas_set_num_threads(n), then using top, I see julia is >>>>>> running at min(n,6)*100% cpu. Any idea why this would be, or how to >>>>>> get mkl to use more threads? I'm not sure if this is an issue in >>>>>> julia >>>>>> or with my installation of mkl. >>>>>> >>>>>> On Mon, Feb 10, 2014 at 2:09 PM, Andreas Noack Jensen >>>>>> <[email protected]> wrote: >>>>>> > You are welcome. Good to hear that it worked. >>>>>> > >>>>>> > >>>>>> > 2014-02-10 22:35 GMT+01:00 Madeleine Udell <[email protected]>: >>>>>> > >>>>>> >> fantastic, thank you. I now see Base.libblas_name="libmkl_rt", and >>>>>> am able >>>>>> >> to run the following test successfully: >>>>>> >> >>>>>> >> transa = 'N'::Base.LinAlg.BlasChar # multiply by A, not A' >>>>>> >> matdescra = "GXXF" # G = general, X = ignored, F = one-based >>>>>> indexing >>>>>> >> m,n = 50,100 >>>>>> >> A = sprand(m,n,.01) >>>>>> >> y = zeros(m) >>>>>> >> x = rand(n) >>>>>> >> alpha = 1.0 >>>>>> >> beta = 1.0 >>>>>> >> >>>>>> >> Base.LinAlg.SparseBLAS.cscmv!(transa,alpha,matdescra,A,x,beta,y) >>>>>> >> y_builtin = A*x >>>>>> >> >>>>>> >> julia> y==y_builtin >>>>>> >> true >>>>>> >> >>>>>> >> >>>>>> >> On Mon, Feb 10, 2014 at 12:08 PM, Andreas Noack Jensen >>>>>> >> <[email protected]> wrote: >>>>>> >>> >>>>>> >>> Hi Madeleine >>>>>> >>> >>>>>> >>> When compiling julia with MKL you'll have to do make cleanall as >>>>>> well as >>>>>> >>> rebuild ARPACK and Suitesparse, i.e. make -C distclean-arpack and >>>>>> make -C >>>>>> >>> distclean-suitesparse. It is also easier to create a Make.user >>>>>> there you set >>>>>> >>> USE_MKL=1 and MKLROOT to the location of your MKL library files, >>>>>> e.g. >>>>>> >>> /opt/intel/mkl. The arguments are explained here >>>>>> >>> >>>>>> >>> >>>>>> >>> http://software.intel.com/sites/products/documentation/hpc/ >>>>>> mkl/mklman/GUID-C2EE93F0-B573-4538-A994-202CB3ADFFC2.htm# >>>>>> GUID-C2EE93F0-B573-4538-A994-202CB3ADFFC2 >>>>>> >>> >>>>>> >>> but transa determines if the operation is Ax or A'x and matdescra >>>>>> has >>>>>> >>> information about the structure of the matrix, e.g. if it is >>>>>> triangular. >>>>>> >>> When you have succeeded in compiling julia with MKL, the libblas >>>>>> variable >>>>>> >>> should just be Base.libblas_name. >>>>>> >>> >>>>>> >>> >>>>>> >>> 2014-02-10 20:37 GMT+01:00 Madeleine Udell <[email protected]>: >>>>>> >>>>>> >>> >>>>>> >>>> I'm having trouble using MKL in julia. I changed Make.inc so >>>>>> that >>>>>> >>>> USE_MKL = 1, >>>>>> >>>> but when I make and run julia, I find that Base.libblas_name = >>>>>> >>>> "libopenblas". Is this expected? I would have thought it would >>>>>> be eg >>>>>> >>>> "libmkl_core". >>>>>> >>>> >>>>>> >>>> Andreas, I found your wrappers for MKL in this PR. I've not used >>>>>> MKL >>>>>> >>>> before, so I don't understand the call signature of those >>>>>> functions in order >>>>>> >>>> to call MKL directly. Any chance you could explain what are >>>>>> transa::BlasChar >>>>>> >>>> and matdescra::ASCIIString in the following function, and which >>>>>> mkl library >>>>>> >>>> is expected in the libblas variable? I see many .so files in the >>>>>> lib/intel64 >>>>>> >>>> directory of my mkl installation, and I'm not sure which one to >>>>>> point julia >>>>>> >>>> to. >>>>>> >>>> >>>>>> >>>> function cscmv!(transa::BlasChar, α::$T, matdescra::ASCIIString, >>>>>> >>>> A::SparseMatrixCSC{$T, BlasInt}, x::StridedVector{$T}, β::$T, >>>>>> >>>> y::StridedVector{$T}) >>>>>> >>>> length(x) == A.n || throw(DimensionMismatch("Matrix with $(A.n) >>>>>> columns >>>>>> >>>> multiplied with vector of length $(length(x))")) >>>>>> >>>> length(y) == A.m || throw(DimensionMismatch("Vector of length >>>>>> $(A.m) >>>>>> >>>> added to vector of length $(length(y))")) # >>>>>> >>>> ccall(($(string(mv)), libblas), Void, >>>>>> >>>> (Ptr{Uint8}, Ptr{BlasInt}, Ptr{BlasInt}, Ptr{$T}, >>>>>> >>>> Ptr{Uint8}, Ptr{$T}, Ptr{BlasInt}, Ptr{BlasInt}, >>>>>> >>>> Ptr{BlasInt}, Ptr{$T}, Ptr{$T}, Ptr{$T}), >>>>>> >>>> &transa, &A.m, &A.n, &α, >>>>>> >>>> matdescra, A.nzval, A.rowval, pointer(A.colptr, 1), >>>>>> >>>> pointer(A.colptr, 2), x, &β, y) >>>>>> >>>> return y >>>>>> >>>> end >>>>>> >>>> >>>>>> >>>> Thanks for your help! >>>>>> >>>> Madeleine >>>>>> >>>> >>>>>> >>>> >>>>>> >>>> On Wed, Feb 5, 2014 at 1:49 PM, Andreas Noack Jensen >>>>>> >>>> <[email protected]> wrote: >>>>>> >>>>> >>>>>> >>>>> A*b will not call MKL when A is sparse. There has been some >>>>>> writing >>>>>> >>>>> about making a MKL package that overwrites >>>>>> A_mul_B(Matrix,Vector) with the >>>>>> >>>>> MKL versions and I actually wrote wrappers for the sparse MKL >>>>>> subroutines in >>>>>> >>>>> the fall for the same reason. >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> 2014-02-05 Madeleine Udell <[email protected]>: >>>>>> >>>>> >>>>>> >>>>>> Miles, you're right that writing sparse matrix vector products >>>>>> in >>>>>> >>>>>> native Julia probably won't be the best idea given Julia's >>>>>> model of >>>>>> >>>>>> parallelism. That's why I'm interested in calling an outside >>>>>> library like >>>>>> >>>>>> PETSc. >>>>>> >>>>>> >>>>>> >>>>>> I see it's possible to link Julia with MKL. I haven't tried >>>>>> this yet, >>>>>> >>>>>> but if I do, will A*b (where A is sparse) call MKL to perform >>>>>> the matrix >>>>>> >>>>>> vector product? >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Feb 5, 2014 at 11:43 AM, Miles Lubin < >>>>>> [email protected]> >>>>>> >>>>>> wrote: >>>>>> >>>>>>> >>>>>> >>>>>>> Memory access is typically a significant bottleneck in sparse >>>>>> >>>>>>> mat-vec, so unfortunately I'm skeptical that one could >>>>>> achieve good >>>>>> >>>>>>> performance using Julia's current distributed memory approach >>>>>> on a multicore >>>>>> >>>>>>> machine. This really calls for something like OpenMP. >>>>>> >>>>>>> >>>>>> >>>>>>> >>>>>> >>>>>>> On Wednesday, February 5, 2014 11:42:00 AM UTC-5, Madeleine >>>>>> Udell >>>>>> >>>>>>> wrote: >>>>>> >>>>>>>> >>>>>> >>>>>>>> I'm developing an iterative optimization algorithm in Julia >>>>>> along >>>>>> >>>>>>>> the lines of other contributions to the Iterative Solvers >>>>>> project or Krylov >>>>>> >>>>>>>> Subspace module whose only computationally intensive step is >>>>>> computing A*b >>>>>> >>>>>>>> or A'*b. I would like to parallelize the method by using a >>>>>> parallel sparse >>>>>> >>>>>>>> matrix vector multiply. Is there a standard backend >>>>>> matrix-vector multiply >>>>>> >>>>>>>> that's recommended in Julia if I'm targeting a shared memory >>>>>> computer with a >>>>>> >>>>>>>> large number of processors? Similarly, is there a >>>>>> recommended backend for >>>>>> >>>>>>>> targeting a cluster? My matrices can easily reach 10 million >>>>>> rows by 1 >>>>>> >>>>>>>> million columns, with sparsity anywhere from .01% to >>>>>> problems that are >>>>>> >>>>>>>> nearly diagonal. >>>>>> >>>>>>>> >>>>>> >>>>>>>> I've seen many posts talking about integrating PETSc as a >>>>>> backend >>>>>> >>>>>>>> for this purpose, but it looks like the project has stalled >>>>>> - the last >>>>>> >>>>>>>> commits I see are a year ago. I'm also interested in other >>>>>> backends, eg >>>>>> >>>>>>>> Spark, SciDB, etc. >>>>>> >>>>>>>> >>>>>> >>>>>>>> I'm more interested in solving sparse problems, but as a >>>>>> side note, >>>>>> >>>>>>>> the built-in BLAS acceleration by changing the number of >>>>>> threads >>>>>> >>>>>>>> `blas_set_num_threads` works ok for dense problems using a >>>>>> moderate number >>>>>> >>>>>>>> of processors. I wonder why the number of threads isn't set >>>>>> higher than one >>>>>> >>>>>>>> by default, for example, using as many as nprocs() cores? >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> Madeleine Udell >>>>>> >>>>>> PhD Candidate in Computational and Mathematical Engineering >>>>>> >>>>>> Stanford University >>>>>> >>>>>> www.stanford.edu/~udell >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> >>>>>> >>>>> -- >>>>>> >>>>> Med venlig hilsen >>>>>> >>>>> >>>>>> >>>>> Andreas Noack Jensen >>>>>> >>>> >>>>>> >>>> >>>>>> >>>> >>>>>> >>>> >>>>>> >>>> -- >>>>>> >>>> Madeleine Udell >>>>>> >>>> PhD Candidate in Computational and Mathematical Engineering >>>>>> >>>> Stanford University >>>>>> >>>> www.stanford.edu/~udell >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> >>> -- >>>>>> >>> Med venlig hilsen >>>>>> >>> >>>>>> >>> Andreas Noack Jensen >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> -- >>>>>> >> Madeleine Udell >>>>>> >> PhD Candidate in Computational and Mathematical Engineering >>>>>> >> Stanford University >>>>>> >> www.stanford.edu/~udell >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > -- >>>>>> > Med venlig hilsen >>>>>> > >>>>>> > Andreas Noack Jensen >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Madeleine Udell >>>>>> PhD Candidate in Computational and Mathematical Engineering >>>>>> Stanford University >>>>>> www.stanford.edu/~udell >>>>>> >>>>> >>>> >>>> >>>> -- >>>> Madeleine Udell >>>> PhD Candidate in Computational and Mathematical Engineering >>>> Stanford University >>>> www.stanford.edu/~udell >>>> >>> >> >> >> -- >> Madeleine Udell >> PhD Candidate in Computational and Mathematical Engineering >> Stanford University >> www.stanford.edu/~udell >> > > > > -- > Madeleine Udell > PhD Candidate in Computational and Mathematical Engineering > Stanford University > www.stanford.edu/~udell >
