But speaking of writing parallel matrix vector products in native
Julia, this might be a great use case for shared arrays (although
right now I think only dense shared arrays exist). Amit, can you
comment on this?

On Wed, Feb 5, 2014 at 1:41 PM, Madeleine Udell
<[email protected]> wrote:
> Miles, you're right that writing sparse matrix vector products in native
> Julia probably won't be the best idea given Julia's model of parallelism.
> That's why I'm interested in calling an outside library like PETSc.
>
> I see it's possible to link Julia with MKL. I haven't tried this yet, but if
> I do, will A*b (where A is sparse) call MKL to perform the matrix vector
> product?
>
>
> On Wed, Feb 5, 2014 at 11:43 AM, Miles Lubin <[email protected]> wrote:
>>
>> Memory access is typically a significant bottleneck in sparse mat-vec, so
>> unfortunately I'm skeptical that one could achieve good performance using
>> Julia's current distributed memory approach on a multicore machine. This
>> really calls for something like OpenMP.
>>
>>
>> On Wednesday, February 5, 2014 11:42:00 AM UTC-5, Madeleine Udell wrote:
>>>
>>> I'm developing an iterative optimization algorithm in Julia along the
>>> lines of other contributions to the Iterative Solvers project or Krylov
>>> Subspace module whose only computationally intensive step is computing A*b
>>> or A'*b. I would like to parallelize the method by using a parallel sparse
>>> matrix vector multiply. Is there a standard backend matrix-vector multiply
>>> that's recommended in Julia if I'm targeting a shared memory computer with a
>>> large number of processors? Similarly, is there a recommended backend for
>>> targeting a cluster? My matrices can easily reach 10 million rows by 1
>>> million columns, with sparsity anywhere from .01% to problems that are
>>> nearly diagonal.
>>>
>>> I've seen many posts talking about integrating PETSc as a backend for
>>> this purpose, but it looks like the project has stalled - the last commits I
>>> see are a year ago. I'm also interested in other backends, eg Spark, SciDB,
>>> etc.
>>>
>>> I'm more interested in solving sparse problems, but as a side note, the
>>> built-in BLAS acceleration by changing the number of threads
>>> `blas_set_num_threads` works ok for dense problems using a moderate number
>>> of processors. I wonder why the number of threads isn't set higher than one
>>> by default, for example, using as many as nprocs() cores?
>
>
>
>
> --
> Madeleine Udell
> PhD Candidate in Computational and Mathematical Engineering
> Stanford University
> www.stanford.edu/~udell



-- 
Madeleine Udell
PhD Candidate in Computational and Mathematical Engineering
Stanford University
www.stanford.edu/~udell

Reply via email to