It looks like only 6 threads are being used when I use mkl from julia.
If I do blas_set_num_threads(n), then using top, I see julia is
running at min(n,6)*100% cpu. Any idea why this would be, or how to
get mkl to use more threads? I'm not sure if this is an issue in julia
or with my installation of mkl.

On Mon, Feb 10, 2014 at 2:09 PM, Andreas Noack Jensen
<[email protected]> wrote:
> You are welcome. Good to hear that it worked.
>
>
> 2014-02-10 22:35 GMT+01:00 Madeleine Udell <[email protected]>:
>
>> fantastic, thank you. I now see Base.libblas_name="libmkl_rt", and am able
>> to run the following test successfully:
>>
>> transa = 'N'::Base.LinAlg.BlasChar # multiply by A, not A'
>> matdescra = "GXXF" # G = general, X = ignored, F = one-based indexing
>> m,n = 50,100
>> A = sprand(m,n,.01)
>> y = zeros(m)
>> x = rand(n)
>> alpha = 1.0
>> beta = 1.0
>>
>> Base.LinAlg.SparseBLAS.cscmv!(transa,alpha,matdescra,A,x,beta,y)
>> y_builtin = A*x
>>
>> julia> y==y_builtin
>> true
>>
>>
>> On Mon, Feb 10, 2014 at 12:08 PM, Andreas Noack Jensen
>> <[email protected]> wrote:
>>>
>>> Hi Madeleine
>>>
>>> When compiling julia with MKL you'll have to do make cleanall as well as
>>> rebuild ARPACK and Suitesparse, i.e. make -C distclean-arpack and make -C
>>> distclean-suitesparse. It is also easier to create a Make.user there you set
>>> USE_MKL=1 and MKLROOT to the location of your MKL library files, e.g.
>>> /opt/intel/mkl. The arguments are explained here
>>>
>>>
>>> http://software.intel.com/sites/products/documentation/hpc/mkl/mklman/GUID-C2EE93F0-B573-4538-A994-202CB3ADFFC2.htm#GUID-C2EE93F0-B573-4538-A994-202CB3ADFFC2
>>>
>>> but transa determines if the operation is Ax or A'x and matdescra has
>>> information about the structure of the matrix, e.g. if it is triangular.
>>> When you have succeeded in compiling julia with MKL, the libblas variable
>>> should just be Base.libblas_name.
>>>
>>>
>>> 2014-02-10 20:37 GMT+01:00 Madeleine Udell <[email protected]>:
>>>
>>>> I'm having trouble using MKL in julia. I changed Make.inc so that
>>>> USE_MKL = 1,
>>>> but when I make and run julia, I find that Base.libblas_name =
>>>> "libopenblas". Is this expected? I would have thought it would be eg
>>>> "libmkl_core".
>>>>
>>>> Andreas, I found your wrappers for MKL in this PR. I've not used MKL
>>>> before, so I don't understand the call signature of those functions in 
>>>> order
>>>> to call MKL directly. Any chance you could explain what are 
>>>> transa::BlasChar
>>>> and matdescra::ASCIIString in the following function, and which mkl library
>>>> is expected in the libblas variable? I see many .so files in the 
>>>> lib/intel64
>>>> directory of my mkl installation, and I'm not sure which one to point julia
>>>> to.
>>>>
>>>> function cscmv!(transa::BlasChar, α::$T, matdescra::ASCIIString,
>>>> A::SparseMatrixCSC{$T, BlasInt}, x::StridedVector{$T}, β::$T,
>>>> y::StridedVector{$T})
>>>> length(x) == A.n || throw(DimensionMismatch("Matrix with $(A.n) columns
>>>> multiplied with vector of length $(length(x))"))
>>>> length(y) == A.m || throw(DimensionMismatch("Vector of length $(A.m)
>>>> added to vector of length $(length(y))")) #
>>>> ccall(($(string(mv)), libblas), Void,
>>>> (Ptr{Uint8}, Ptr{BlasInt}, Ptr{BlasInt}, Ptr{$T},
>>>> Ptr{Uint8}, Ptr{$T}, Ptr{BlasInt}, Ptr{BlasInt},
>>>> Ptr{BlasInt}, Ptr{$T}, Ptr{$T}, Ptr{$T}),
>>>> &transa, &A.m, &A.n, &α,
>>>> matdescra, A.nzval, A.rowval, pointer(A.colptr, 1),
>>>> pointer(A.colptr, 2), x, &β, y)
>>>> return y
>>>> end
>>>>
>>>> Thanks for your help!
>>>> Madeleine
>>>>
>>>>
>>>> On Wed, Feb 5, 2014 at 1:49 PM, Andreas Noack Jensen
>>>> <[email protected]> wrote:
>>>>>
>>>>> A*b will not call MKL when A is sparse. There has been some writing
>>>>> about making a MKL package that overwrites A_mul_B(Matrix,Vector) with the
>>>>> MKL versions and I actually wrote wrappers for the sparse MKL subroutines 
>>>>> in
>>>>> the fall for the same reason.
>>>>>
>>>>>
>>>>> 2014-02-05 Madeleine Udell <[email protected]>:
>>>>>
>>>>>> Miles, you're right that writing sparse matrix vector products in
>>>>>> native Julia probably won't be the best idea given Julia's model of
>>>>>> parallelism. That's why I'm interested in calling an outside library like
>>>>>> PETSc.
>>>>>>
>>>>>> I see it's possible to link Julia with MKL. I haven't tried this yet,
>>>>>> but if I do, will A*b (where A is sparse) call MKL to perform the matrix
>>>>>> vector product?
>>>>>>
>>>>>>
>>>>>> On Wed, Feb 5, 2014 at 11:43 AM, Miles Lubin <[email protected]>
>>>>>> wrote:
>>>>>>>
>>>>>>> Memory access is typically a significant bottleneck in sparse
>>>>>>> mat-vec, so unfortunately I'm skeptical that one could achieve good
>>>>>>> performance using Julia's current distributed memory approach on a 
>>>>>>> multicore
>>>>>>> machine. This really calls for something like OpenMP.
>>>>>>>
>>>>>>>
>>>>>>> On Wednesday, February 5, 2014 11:42:00 AM UTC-5, Madeleine Udell
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> I'm developing an iterative optimization algorithm in Julia along
>>>>>>>> the lines of other contributions to the Iterative Solvers project or 
>>>>>>>> Krylov
>>>>>>>> Subspace module whose only computationally intensive step is computing 
>>>>>>>> A*b
>>>>>>>> or A'*b. I would like to parallelize the method by using a parallel 
>>>>>>>> sparse
>>>>>>>> matrix vector multiply. Is there a standard backend matrix-vector 
>>>>>>>> multiply
>>>>>>>> that's recommended in Julia if I'm targeting a shared memory computer 
>>>>>>>> with a
>>>>>>>> large number of processors? Similarly, is there a recommended backend 
>>>>>>>> for
>>>>>>>> targeting a cluster? My matrices can easily reach 10 million rows by 1
>>>>>>>> million columns, with sparsity anywhere from .01% to problems that are
>>>>>>>> nearly diagonal.
>>>>>>>>
>>>>>>>> I've seen many posts talking about integrating PETSc as a backend
>>>>>>>> for this purpose, but it looks like the project has stalled - the last
>>>>>>>> commits I see are a year ago. I'm also interested in other backends, eg
>>>>>>>> Spark, SciDB, etc.
>>>>>>>>
>>>>>>>> I'm more interested in solving sparse problems, but as a side note,
>>>>>>>> the built-in BLAS acceleration by changing the number of threads
>>>>>>>> `blas_set_num_threads` works ok for dense problems using a moderate 
>>>>>>>> number
>>>>>>>> of processors. I wonder why the number of threads isn't set higher 
>>>>>>>> than one
>>>>>>>> by default, for example, using as many as nprocs() cores?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Madeleine Udell
>>>>>> PhD Candidate in Computational and Mathematical Engineering
>>>>>> Stanford University
>>>>>> www.stanford.edu/~udell
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Med venlig hilsen
>>>>>
>>>>> Andreas Noack Jensen
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Madeleine Udell
>>>> PhD Candidate in Computational and Mathematical Engineering
>>>> Stanford University
>>>> www.stanford.edu/~udell
>>>
>>>
>>>
>>>
>>> --
>>> Med venlig hilsen
>>>
>>> Andreas Noack Jensen
>>
>>
>>
>>
>> --
>> Madeleine Udell
>> PhD Candidate in Computational and Mathematical Engineering
>> Stanford University
>> www.stanford.edu/~udell
>
>
>
>
> --
> Med venlig hilsen
>
> Andreas Noack Jensen



-- 
Madeleine Udell
PhD Candidate in Computational and Mathematical Engineering
Stanford University
www.stanford.edu/~udell

Reply via email to