That is right. In parallel mode, having each process call multiple threads can lead to an adverse slowdown, if number of processes * no. of threads/process is higher than the number of available cores.
-viral On Thursday, July 10, 2014 5:20:51 AM UTC-7, Thomas Covert wrote: > > Thanks for the update. Just out of curiosity, why does Julia call a > single threaded LAPACK routine when in parallel processing mode? Is it the > case that its impossible to have several processes calling multi-threaded > LAPACK routines, or just that its less efficient than having several > processes calling single-threaded LAPACK routines? > > -Thom > > On Thursday, July 10, 2014 3:08:24 AM UTC-5, Andreas Noack wrote: >> >> I think the problem is in the single threaded version of dpotri in >> OpenBLAS. When you add processes to Julia, OpenBLAS is called singled >> threaded and therefore you see the problem when using addprocs. I could >> reproduce the error by calling blas_set_num_threads(1). I have filed an >> issues at the Julia and OpenBLAS github sites. >> >> >> 2014-07-10 3:51 GMT+02:00 Thomas Covert <[email protected]>: >> >>> Here's are two additional pieces to the puzzle. tl;dr is that the >>> parallel version and serial version generate different cholesky factors, >>> and that conditional on those computed factors, a "serial" call to >>> cholfact(inv(C)) works fine on both computed factors, while a "parallel" >>> call doesn't work on either. >>> >>> 1) If I fix the random seed to be the same across runs, the non-parallel >>> version and the parallel version generate slightly different values of C. >>> The maximum absolute difference between them is on the order of 10e-15, >>> but almost all values in the upper left triangle are different from each >>> other. >>> >>> 2) Taking the above computations of C and calling CS the version >>> computed in the absence of addprocs() and CP the version computed with >>> addprocs(), I get another difference. If I have saved these matrices, open >>> a fresh instance of julia (no addprocs()), and read them in, both >>> cholfact(inv(CS)) and cholfact(inv(CP)) work fine. If I do a fresh open, >>> then addprocs(), then read them in, NEITHER cholfact(inv(CS)) and >>> cholfact(inv(CP)) work, and they both throw the same PosDefException number. >>> >>> >>> On Wednesday, July 9, 2014 5:50:07 PM UTC-5, Thomas Covert wrote: >>>> >>>> I have found cholfact to behave differently (erroneously?) under >>>> parallel processing contexts than under standard settings. What I mean by >>>> "parallel processing" is simply having previously called addprocs(). Here >>>> is some example code that I am running on my mid-2009 MacBook Pro using a >>>> somewhat recent brew of @staticfloat's homebrew distribution: >>>> >>>> addprocs(8) >>>> >>>> N = 1000 >>>> >>>> x = 10 * randn(N) >>>> >>>> X = zeros(N,N) >>>> >>>> >>>> for i = 1:N >>>> >>>> for j = 1:N >>>> >>>> X[i,j] = exp(-.5 * (x[i]-x[j])^2) >>>> >>>> end >>>> >>>> end >>>> >>>> >>>> X = X + diagm(.5 * ones(N)) >>>> >>>> >>>> C = cholfact(X) >>>> >>>> iC = inv(C) >>>> >>>> CiC = cholfact(iC) >>>> >>>> I believe this code generates an X which is positive definite by >>>> construction. >>>> >>>> If I run this code as-is, I get the following error (or something >>>> similar, the PosDefException sometimes changes): >>>> >>>> *ERROR: PosDefException(12)* >>>> >>>> * in cholfact! at linalg/factorization.jl:36* >>>> >>>> * in cholfact at linalg/factorization.jl:39* >>>> >>>> *while loading /Users/tcovert/path_to_code.jl, in expression starting >>>> on line 16* >>>> >>>> However, if I comment out the "addprocs(8)" line, everything works >>>> fine. Also, for smaller values of N the problem goes away (N=100,200 is >>>> fine, N=400 is not). Here is my versioninfo() if that helps: >>>> >>>> *julia> **versioninfo()* >>>> >>>> Julia Version 0.3.0-prerelease+3868 >>>> >>>> Commit e7a9a7d* (2014-06-24 19:39 UTC) >>>> >>>> Platform Info: >>>> >>>> System: Darwin (x86_64-apple-darwin13.2.0) >>>> >>>> CPU: Intel(R) Core(TM)2 Duo CPU P8700 @ 2.53GHz >>>> >>>> WORD_SIZE: 64 >>>> >>>> BLAS: libopenblas (USE64BITINT NO_AFFINITY) >>>> >>>> LAPACK: libopenblas >>>> >>>> LIBM: libopenlibm >>>> >>>> >> >> >> -- >> Med venlig hilsen >> >> Andreas Noack Jensen >> >
