Are those hyper-threaded "cores"? If so, check Intel MKL's documentation on hyper-threading.
-Best Jake On Monday, February 10, 2014 6:38:50 PM UTC-5, Madeleine Udell wrote: > > It looks like only 6 threads are being used when I use mkl from julia. > If I do blas_set_num_threads(n), then using top, I see julia is > running at min(n,6)*100% cpu. Any idea why this would be, or how to > get mkl to use more threads? I'm not sure if this is an issue in julia > or with my installation of mkl. > > On Mon, Feb 10, 2014 at 2:09 PM, Andreas Noack Jensen > <[email protected] <javascript:>> wrote: > > You are welcome. Good to hear that it worked. > > > > > > 2014-02-10 22:35 GMT+01:00 Madeleine Udell > > <[email protected]<javascript:>>: > > > > >> fantastic, thank you. I now see Base.libblas_name="libmkl_rt", and am > able > >> to run the following test successfully: > >> > >> transa = 'N'::Base.LinAlg.BlasChar # multiply by A, not A' > >> matdescra = "GXXF" # G = general, X = ignored, F = one-based indexing > >> m,n = 50,100 > >> A = sprand(m,n,.01) > >> y = zeros(m) > >> x = rand(n) > >> alpha = 1.0 > >> beta = 1.0 > >> > >> Base.LinAlg.SparseBLAS.cscmv!(transa,alpha,matdescra,A,x,beta,y) > >> y_builtin = A*x > >> > >> julia> y==y_builtin > >> true > >> > >> > >> On Mon, Feb 10, 2014 at 12:08 PM, Andreas Noack Jensen > >> <[email protected] <javascript:>> wrote: > >>> > >>> Hi Madeleine > >>> > >>> When compiling julia with MKL you'll have to do make cleanall as well > as > >>> rebuild ARPACK and Suitesparse, i.e. make -C distclean-arpack and make > -C > >>> distclean-suitesparse. It is also easier to create a Make.user there > you set > >>> USE_MKL=1 and MKLROOT to the location of your MKL library files, e.g. > >>> /opt/intel/mkl. The arguments are explained here > >>> > >>> > >>> > http://software.intel.com/sites/products/documentation/hpc/mkl/mklman/GUID-C2EE93F0-B573-4538-A994-202CB3ADFFC2.htm#GUID-C2EE93F0-B573-4538-A994-202CB3ADFFC2 > > >>> > >>> but transa determines if the operation is Ax or A'x and matdescra has > >>> information about the structure of the matrix, e.g. if it is > triangular. > >>> When you have succeeded in compiling julia with MKL, the libblas > variable > >>> should just be Base.libblas_name. > >>> > >>> > >>> 2014-02-10 20:37 GMT+01:00 Madeleine Udell > >>> <[email protected]<javascript:>>: > > >>> > >>>> I'm having trouble using MKL in julia. I changed Make.inc so that > >>>> USE_MKL = 1, > >>>> but when I make and run julia, I find that Base.libblas_name = > >>>> "libopenblas". Is this expected? I would have thought it would be eg > >>>> "libmkl_core". > >>>> > >>>> Andreas, I found your wrappers for MKL in this PR. I've not used MKL > >>>> before, so I don't understand the call signature of those functions > in order > >>>> to call MKL directly. Any chance you could explain what are > transa::BlasChar > >>>> and matdescra::ASCIIString in the following function, and which mkl > library > >>>> is expected in the libblas variable? I see many .so files in the > lib/intel64 > >>>> directory of my mkl installation, and I'm not sure which one to point > julia > >>>> to. > >>>> > >>>> function cscmv!(transa::BlasChar, α::$T, matdescra::ASCIIString, > >>>> A::SparseMatrixCSC{$T, BlasInt}, x::StridedVector{$T}, β::$T, > >>>> y::StridedVector{$T}) > >>>> length(x) == A.n || throw(DimensionMismatch("Matrix with $(A.n) > columns > >>>> multiplied with vector of length $(length(x))")) > >>>> length(y) == A.m || throw(DimensionMismatch("Vector of length $(A.m) > >>>> added to vector of length $(length(y))")) # > >>>> ccall(($(string(mv)), libblas), Void, > >>>> (Ptr{Uint8}, Ptr{BlasInt}, Ptr{BlasInt}, Ptr{$T}, > >>>> Ptr{Uint8}, Ptr{$T}, Ptr{BlasInt}, Ptr{BlasInt}, > >>>> Ptr{BlasInt}, Ptr{$T}, Ptr{$T}, Ptr{$T}), > >>>> &transa, &A.m, &A.n, &α, > >>>> matdescra, A.nzval, A.rowval, pointer(A.colptr, 1), > >>>> pointer(A.colptr, 2), x, &β, y) > >>>> return y > >>>> end > >>>> > >>>> Thanks for your help! > >>>> Madeleine > >>>> > >>>> > >>>> On Wed, Feb 5, 2014 at 1:49 PM, Andreas Noack Jensen > >>>> <[email protected] <javascript:>> wrote: > >>>>> > >>>>> A*b will not call MKL when A is sparse. There has been some writing > >>>>> about making a MKL package that overwrites A_mul_B(Matrix,Vector) > with the > >>>>> MKL versions and I actually wrote wrappers for the sparse MKL > subroutines in > >>>>> the fall for the same reason. > >>>>> > >>>>> > >>>>> 2014-02-05 Madeleine Udell <[email protected] <javascript:>>: > >>>>> > >>>>>> Miles, you're right that writing sparse matrix vector products in > >>>>>> native Julia probably won't be the best idea given Julia's model of > >>>>>> parallelism. That's why I'm interested in calling an outside > library like > >>>>>> PETSc. > >>>>>> > >>>>>> I see it's possible to link Julia with MKL. I haven't tried this > yet, > >>>>>> but if I do, will A*b (where A is sparse) call MKL to perform the > matrix > >>>>>> vector product? > >>>>>> > >>>>>> > >>>>>> On Wed, Feb 5, 2014 at 11:43 AM, Miles Lubin > >>>>>> <[email protected]<javascript:>> > > >>>>>> wrote: > >>>>>>> > >>>>>>> Memory access is typically a significant bottleneck in sparse > >>>>>>> mat-vec, so unfortunately I'm skeptical that one could achieve > good > >>>>>>> performance using Julia's current distributed memory approach on a > multicore > >>>>>>> machine. This really calls for something like OpenMP. > >>>>>>> > >>>>>>> > >>>>>>> On Wednesday, February 5, 2014 11:42:00 AM UTC-5, Madeleine Udell > >>>>>>> wrote: > >>>>>>>> > >>>>>>>> I'm developing an iterative optimization algorithm in Julia along > >>>>>>>> the lines of other contributions to the Iterative Solvers project > or Krylov > >>>>>>>> Subspace module whose only computationally intensive step is > computing A*b > >>>>>>>> or A'*b. I would like to parallelize the method by using a > parallel sparse > >>>>>>>> matrix vector multiply. Is there a standard backend matrix-vector > multiply > >>>>>>>> that's recommended in Julia if I'm targeting a shared memory > computer with a > >>>>>>>> large number of processors? Similarly, is there a recommended > backend for > >>>>>>>> targeting a cluster? My matrices can easily reach 10 million rows > by 1 > >>>>>>>> million columns, with sparsity anywhere from .01% to problems > that are > >>>>>>>> nearly diagonal. > >>>>>>>> > >>>>>>>> I've seen many posts talking about integrating PETSc as a backend > >>>>>>>> for this purpose, but it looks like the project has stalled - the > last > >>>>>>>> commits I see are a year ago. I'm also interested in other > backends, eg > >>>>>>>> Spark, SciDB, etc. > >>>>>>>> > >>>>>>>> I'm more interested in solving sparse problems, but as a side > note, > >>>>>>>> the built-in BLAS acceleration by changing the number of threads > >>>>>>>> `blas_set_num_threads` works ok for dense problems using a > moderate number > >>>>>>>> of processors. I wonder why the number of threads isn't set > higher than one > >>>>>>>> by default, for example, using as many as nprocs() cores? > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> -- > >>>>>> Madeleine Udell > >>>>>> PhD Candidate in Computational and Mathematical Engineering > >>>>>> Stanford University > >>>>>> www.stanford.edu/~udell > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> Med venlig hilsen > >>>>> > >>>>> Andreas Noack Jensen > >>>> > >>>> > >>>> > >>>> > >>>> -- > >>>> Madeleine Udell > >>>> PhD Candidate in Computational and Mathematical Engineering > >>>> Stanford University > >>>> www.stanford.edu/~udell > >>> > >>> > >>> > >>> > >>> -- > >>> Med venlig hilsen > >>> > >>> Andreas Noack Jensen > >> > >> > >> > >> > >> -- > >> Madeleine Udell > >> PhD Candidate in Computational and Mathematical Engineering > >> Stanford University > >> www.stanford.edu/~udell > > > > > > > > > > -- > > Med venlig hilsen > > > > Andreas Noack Jensen > > > > -- > Madeleine Udell > PhD Candidate in Computational and Mathematical Engineering > Stanford University > www.stanford.edu/~udell >
