A*b will not call MKL when A is sparse. There has been some writing about making a MKL package that overwrites A_mul_B(Matrix,Vector) with the MKL versions and I actually wrote wrappers for the sparse MKL subroutines in the fall for the same reason.
2014-02-05 Madeleine Udell <[email protected]>: > Miles, you're right that writing sparse matrix vector products in native > Julia probably won't be the best idea given Julia's model of parallelism. > That's why I'm interested in calling an outside library like PETSc. > > I see it's possible to link Julia with MKL. I haven't tried this yet, but > if I do, will A*b (where A is sparse) call MKL to perform the matrix vector > product? > > > On Wed, Feb 5, 2014 at 11:43 AM, Miles Lubin <[email protected]>wrote: > >> Memory access is typically a significant bottleneck in sparse mat-vec, so >> unfortunately I'm skeptical that one could achieve good performance using >> Julia's current distributed memory approach on a multicore machine. This >> really calls for something like OpenMP. >> >> >> On Wednesday, February 5, 2014 11:42:00 AM UTC-5, Madeleine Udell wrote: >>> >>> I'm developing an iterative optimization algorithm in Julia along the >>> lines of other contributions to the Iterative Solvers >>> project<https://github.com/JuliaLang/IterativeSolvers.jl>or Krylov >>> Subspace >>> <https://github.com/JuliaLang/IterativeSolvers.jl/blob/master/src/krylov.jl>module >>> whose >>> only computationally intensive step is computing A*b or A'*b. I would like >>> to parallelize the method by using a parallel sparse matrix vector >>> multiply. Is there a standard backend matrix-vector multiply that's >>> recommended in Julia if I'm targeting a shared memory computer with a large >>> number of processors? Similarly, is there a recommended backend for >>> targeting a cluster? My matrices can easily reach 10 million rows by 1 >>> million columns, with sparsity anywhere from .01% to problems that are >>> nearly diagonal. >>> >>> I've seen many posts <https://github.com/JuliaLang/julia/issues/2645> >>> talking >>> about integrating PETSc as a backend for this purpose, but it looks like >>> the >>> project<https://github.com/petsc/petsc/blob/master/bin/julia/PETSc.jl>has >>> stalled - the last commits I see are a year ago. I'm also interested in >>> other backends, eg Spark <http://spark.incubator.apache.org/>, >>> SciDB<http://scidb.org/>, >>> etc. >>> >>> I'm more interested in solving sparse problems, but as a side note, the >>> built-in BLAS acceleration by changing the number of threads >>> `blas_set_num_threads` >>> works ok for dense problems using a moderate number of processors. I wonder >>> why the number of threads isn't set higher than one by default, for >>> example, using as many as nprocs() cores? >>> >> > > > -- > Madeleine Udell > PhD Candidate in Computational and Mathematical Engineering > Stanford University > www.stanford.edu/~udell > -- Med venlig hilsen Andreas Noack Jensen
