I can try to answer the last part. blas_set_num_threads is set to one only when Julia is started in parallel mode, i.e. via the "-p" argument, or if you execute an addprocs. This is done since the blas library by default optimizes its thread count to the number of cores. If this is not done, say on a 4-core system, you started Julia with "-p 4" and in each Julia process, blas started 4 threads each, you end up with 16 compute threads competing for 4 cores which is inefficient.
On Wed, Feb 5, 2014 at 10:12 PM, Madeleine Udell <[email protected]>wrote: > I'm developing an iterative optimization algorithm in Julia along the > lines of other contributions to the Iterative Solvers > project<https://github.com/JuliaLang/IterativeSolvers.jl>or Krylov > Subspace > <https://github.com/JuliaLang/IterativeSolvers.jl/blob/master/src/krylov.jl>module > whose > only computationally intensive step is computing A*b or A'*b. I would like > to parallelize the method by using a parallel sparse matrix vector > multiply. Is there a standard backend matrix-vector multiply that's > recommended in Julia if I'm targeting a shared memory computer with a large > number of processors? Similarly, is there a recommended backend for > targeting a cluster? My matrices can easily reach 10 million rows by 1 > million columns, with sparsity anywhere from .01% to problems that are > nearly diagonal. > > I've seen many posts <https://github.com/JuliaLang/julia/issues/2645> talking > about integrating PETSc as a backend for this purpose, but it looks like > the project<https://github.com/petsc/petsc/blob/master/bin/julia/PETSc.jl>has > stalled - the last commits I see are a year ago. I'm also interested in > other backends, eg Spark <http://spark.incubator.apache.org/>, > SciDB<http://scidb.org/>, > etc. > > I'm more interested in solving sparse problems, but as a side note, the > built-in BLAS acceleration by changing the number of threads > `blas_set_num_threads` > works ok for dense problems using a moderate number of processors. I wonder > why the number of threads isn't set higher than one by default, for > example, using as many as nprocs() cores? >
