But speaking of writing parallel matrix vector products in native Julia, this might be a great use case for shared arrays (although right now I think only dense shared arrays exist). Amit, can you comment on this?
On Wed, Feb 5, 2014 at 1:41 PM, Madeleine Udell <[email protected]> wrote: > Miles, you're right that writing sparse matrix vector products in native > Julia probably won't be the best idea given Julia's model of parallelism. > That's why I'm interested in calling an outside library like PETSc. > > I see it's possible to link Julia with MKL. I haven't tried this yet, but if > I do, will A*b (where A is sparse) call MKL to perform the matrix vector > product? > > > On Wed, Feb 5, 2014 at 11:43 AM, Miles Lubin <[email protected]> wrote: >> >> Memory access is typically a significant bottleneck in sparse mat-vec, so >> unfortunately I'm skeptical that one could achieve good performance using >> Julia's current distributed memory approach on a multicore machine. This >> really calls for something like OpenMP. >> >> >> On Wednesday, February 5, 2014 11:42:00 AM UTC-5, Madeleine Udell wrote: >>> >>> I'm developing an iterative optimization algorithm in Julia along the >>> lines of other contributions to the Iterative Solvers project or Krylov >>> Subspace module whose only computationally intensive step is computing A*b >>> or A'*b. I would like to parallelize the method by using a parallel sparse >>> matrix vector multiply. Is there a standard backend matrix-vector multiply >>> that's recommended in Julia if I'm targeting a shared memory computer with a >>> large number of processors? Similarly, is there a recommended backend for >>> targeting a cluster? My matrices can easily reach 10 million rows by 1 >>> million columns, with sparsity anywhere from .01% to problems that are >>> nearly diagonal. >>> >>> I've seen many posts talking about integrating PETSc as a backend for >>> this purpose, but it looks like the project has stalled - the last commits I >>> see are a year ago. I'm also interested in other backends, eg Spark, SciDB, >>> etc. >>> >>> I'm more interested in solving sparse problems, but as a side note, the >>> built-in BLAS acceleration by changing the number of threads >>> `blas_set_num_threads` works ok for dense problems using a moderate number >>> of processors. I wonder why the number of threads isn't set higher than one >>> by default, for example, using as many as nprocs() cores? > > > > > -- > Madeleine Udell > PhD Candidate in Computational and Mathematical Engineering > Stanford University > www.stanford.edu/~udell -- Madeleine Udell PhD Candidate in Computational and Mathematical Engineering Stanford University www.stanford.edu/~udell
