I can try to answer the last part. blas_set_num_threads is set to one only
when Julia is started in parallel mode, i.e. via the "-p" argument, or if
you execute an addprocs. This is done since the blas library by default
optimizes its thread count to the number of cores. If this is not done, say
on a 4-core system, you started Julia with "-p 4" and in each Julia
process, blas started 4 threads each, you end up with 16 compute threads
competing for 4 cores which is inefficient.


On Wed, Feb 5, 2014 at 10:12 PM, Madeleine Udell
<[email protected]>wrote:

> I'm developing an iterative optimization algorithm in Julia along the
> lines of other contributions to the Iterative Solvers 
> project<https://github.com/JuliaLang/IterativeSolvers.jl>or Krylov
> Subspace
> <https://github.com/JuliaLang/IterativeSolvers.jl/blob/master/src/krylov.jl>module
>  whose
> only computationally intensive step is computing A*b or A'*b. I would like
> to parallelize the method by using a parallel sparse matrix vector
> multiply. Is there a standard backend matrix-vector multiply that's
> recommended in Julia if I'm targeting a shared memory computer with a large
> number of processors? Similarly, is there a recommended backend for
> targeting a cluster? My matrices can easily reach 10 million rows by 1
> million columns, with sparsity anywhere from .01% to problems that are
> nearly diagonal.
>
> I've seen many posts <https://github.com/JuliaLang/julia/issues/2645> talking
> about integrating PETSc as a backend for this purpose, but it looks like
> the project<https://github.com/petsc/petsc/blob/master/bin/julia/PETSc.jl>has 
> stalled - the last commits I see are a year ago. I'm also interested in
> other backends, eg Spark <http://spark.incubator.apache.org/>, 
> SciDB<http://scidb.org/>,
> etc.
>
> I'm more interested in solving sparse problems, but as a side note, the
> built-in BLAS acceleration by changing the number of threads 
> `blas_set_num_threads`
> works ok for dense problems using a moderate number of processors. I wonder
> why the number of threads isn't set higher than one by default, for
> example, using as many as nprocs() cores?
>

Reply via email to