You are welcome. Good to hear that it worked.
2014-02-10 22:35 GMT+01:00 Madeleine Udell <[email protected]>: > fantastic, thank you. I now see Base.libblas_name="libmkl_rt", and am able > to run the following test successfully: > > transa = 'N'::Base.LinAlg.BlasChar # multiply by A, not A' > matdescra = "GXXF" # G = general, X = ignored, F = one-based indexing > m,n = 50,100 > A = sprand(m,n,.01) > y = zeros(m) > x = rand(n) > alpha = 1.0 > beta = 1.0 > > Base.LinAlg.SparseBLAS.cscmv!(transa,alpha,matdescra,A,x,beta,y) > y_builtin = A*x > > julia> y==y_builtin > true > > > On Mon, Feb 10, 2014 at 12:08 PM, Andreas Noack Jensen < > [email protected]> wrote: > >> Hi Madeleine >> >> When compiling julia with MKL you'll have to do make cleanall as well as >> rebuild ARPACK and Suitesparse, i.e. make -C distclean-arpack and make -C >> distclean-suitesparse. It is also easier to create a Make.user there you >> set USE_MKL=1 and MKLROOT to the location of your MKL library files, e.g. >> /opt/intel/mkl. The arguments are explained here >> >> >> http://software.intel.com/sites/products/documentation/hpc/mkl/mklman/GUID-C2EE93F0-B573-4538-A994-202CB3ADFFC2.htm#GUID-C2EE93F0-B573-4538-A994-202CB3ADFFC2 >> >> but transa determines if the operation is Ax or A'x and matdescra has >> information about the structure of the matrix, e.g. if it is triangular. >> When you have succeeded in compiling julia with MKL, the libblas variable >> should just be Base.libblas_name. >> >> >> 2014-02-10 20:37 GMT+01:00 Madeleine Udell <[email protected]>: >> >> I'm having trouble using MKL in julia. I changed Make.inc so that USE_MKL >>> = 1, >>> but when I make and run julia, I find that Base.libblas_name = >>> "libopenblas". Is this expected? I would have thought it would be eg >>> "libmkl_core". >>> >>> Andreas, I found your wrappers for MKL in this >>> PR<https://github.com/JuliaLang/julia/pull/4525/files>. >>> I've not used MKL before, so I don't understand the call signature of those >>> functions in order to call MKL directly. Any chance you could explain what >>> are transa::BlasChar and matdescra::ASCIIString in the following function, >>> and which mkl library is expected in the libblas variable? I see many .so >>> files in the lib/intel64 directory of my mkl installation, and I'm not sure >>> which one to point julia to. >>> >>> function cscmv!(transa::BlasChar, α::$T, matdescra::ASCIIString, >>> A::SparseMatrixCSC{$T, BlasInt}, x::StridedVector{$T}, β::$T, >>> y::StridedVector{$T}) >>> length(x) == A.n || throw(DimensionMismatch("Matrix with $(A.n) columns >>> multiplied with vector of length $(length(x))")) >>> length(y) == A.m || throw(DimensionMismatch("Vector of length $(A.m) >>> added to vector of length $(length(y))")) # >>> ccall(($(string(mv)), libblas), Void, >>> (Ptr{Uint8}, Ptr{BlasInt}, Ptr{BlasInt}, Ptr{$T}, >>> Ptr{Uint8}, Ptr{$T}, Ptr{BlasInt}, Ptr{BlasInt}, >>> Ptr{BlasInt}, Ptr{$T}, Ptr{$T}, Ptr{$T}), >>> &transa, &A.m, &A.n, &α, >>> matdescra, A.nzval, A.rowval, pointer(A.colptr, 1), >>> pointer(A.colptr, 2), x, &β, y) >>> return y >>> end >>> >>> Thanks for your help! >>> Madeleine >>> >>> >>> On Wed, Feb 5, 2014 at 1:49 PM, Andreas Noack Jensen < >>> [email protected]> wrote: >>> >>>> A*b will not call MKL when A is sparse. There has been some writing >>>> about making a MKL package that overwrites A_mul_B(Matrix,Vector) with the >>>> MKL versions and I actually wrote wrappers for the sparse MKL subroutines >>>> in the fall for the same reason. >>>> >>>> >>>> 2014-02-05 Madeleine Udell <[email protected]>: >>>> >>>> Miles, you're right that writing sparse matrix vector products in >>>>> native Julia probably won't be the best idea given Julia's model of >>>>> parallelism. That's why I'm interested in calling an outside library like >>>>> PETSc. >>>>> >>>>> I see it's possible to link Julia with MKL. I haven't tried this yet, >>>>> but if I do, will A*b (where A is sparse) call MKL to perform the matrix >>>>> vector product? >>>>> >>>>> >>>>> On Wed, Feb 5, 2014 at 11:43 AM, Miles Lubin <[email protected]>wrote: >>>>> >>>>>> Memory access is typically a significant bottleneck in sparse >>>>>> mat-vec, so unfortunately I'm skeptical that one could achieve good >>>>>> performance using Julia's current distributed memory approach on a >>>>>> multicore machine. This really calls for something like OpenMP. >>>>>> >>>>>> >>>>>> On Wednesday, February 5, 2014 11:42:00 AM UTC-5, Madeleine Udell >>>>>> wrote: >>>>>>> >>>>>>> I'm developing an iterative optimization algorithm in Julia along >>>>>>> the lines of other contributions to the Iterative Solvers >>>>>>> project<https://github.com/JuliaLang/IterativeSolvers.jl>or Krylov >>>>>>> Subspace >>>>>>> <https://github.com/JuliaLang/IterativeSolvers.jl/blob/master/src/krylov.jl>module >>>>>>> whose >>>>>>> only computationally intensive step is computing A*b or A'*b. I would >>>>>>> like >>>>>>> to parallelize the method by using a parallel sparse matrix vector >>>>>>> multiply. Is there a standard backend matrix-vector multiply that's >>>>>>> recommended in Julia if I'm targeting a shared memory computer with a >>>>>>> large >>>>>>> number of processors? Similarly, is there a recommended backend for >>>>>>> targeting a cluster? My matrices can easily reach 10 million rows by 1 >>>>>>> million columns, with sparsity anywhere from .01% to problems that are >>>>>>> nearly diagonal. >>>>>>> >>>>>>> I've seen many posts<https://github.com/JuliaLang/julia/issues/2645> >>>>>>> talking >>>>>>> about integrating PETSc as a backend for this purpose, but it looks like >>>>>>> the >>>>>>> project<https://github.com/petsc/petsc/blob/master/bin/julia/PETSc.jl>has >>>>>>> stalled - the last commits I see are a year ago. I'm also interested in >>>>>>> other backends, eg Spark <http://spark.incubator.apache.org/>, >>>>>>> SciDB<http://scidb.org/>, >>>>>>> etc. >>>>>>> >>>>>>> I'm more interested in solving sparse problems, but as a side note, >>>>>>> the built-in BLAS acceleration by changing the number of threads >>>>>>> `blas_set_num_threads` >>>>>>> works ok for dense problems using a moderate number of processors. I >>>>>>> wonder >>>>>>> why the number of threads isn't set higher than one by default, for >>>>>>> example, using as many as nprocs() cores? >>>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Madeleine Udell >>>>> PhD Candidate in Computational and Mathematical Engineering >>>>> Stanford University >>>>> www.stanford.edu/~udell >>>>> >>>> >>>> >>>> >>>> -- >>>> Med venlig hilsen >>>> >>>> Andreas Noack Jensen >>>> >>> >>> >>> >>> -- >>> Madeleine Udell >>> PhD Candidate in Computational and Mathematical Engineering >>> Stanford University >>> www.stanford.edu/~udell >>> >> >> >> >> -- >> Med venlig hilsen >> >> Andreas Noack Jensen >> > > > > -- > Madeleine Udell > PhD Candidate in Computational and Mathematical Engineering > Stanford University > www.stanford.edu/~udell > -- Med venlig hilsen Andreas Noack Jensen
