That is right. In parallel mode, having each process call multiple threads 
can lead to an adverse slowdown, if number of processes * no. of 
threads/process is higher than the number of available cores.

-viral

On Thursday, July 10, 2014 5:20:51 AM UTC-7, Thomas Covert wrote:
>
> Thanks for the update.  Just out of curiosity, why does Julia call a 
> single threaded LAPACK routine when in parallel processing mode?  Is it the 
> case that its impossible to have several processes calling multi-threaded 
> LAPACK routines, or just that its less efficient than having several 
> processes calling single-threaded LAPACK routines?
>
> -Thom
>
> On Thursday, July 10, 2014 3:08:24 AM UTC-5, Andreas Noack wrote:
>>
>> I think the problem is in the single threaded version of dpotri in 
>> OpenBLAS. When you add processes to Julia, OpenBLAS is called singled 
>> threaded and therefore you see the problem when using addprocs. I could 
>> reproduce the error by calling blas_set_num_threads(1). I have filed an 
>> issues at the Julia and OpenBLAS github sites.
>>
>>
>> 2014-07-10 3:51 GMT+02:00 Thomas Covert <[email protected]>:
>>
>>> Here's are two additional pieces to the puzzle.  tl;dr is that the 
>>> parallel version and serial version generate different cholesky factors, 
>>> and that conditional on those computed factors, a "serial" call to 
>>> cholfact(inv(C)) works fine on both computed factors, while a "parallel" 
>>> call doesn't work on either. 
>>>
>>> 1) If I fix the random seed to be the same across runs, the non-parallel 
>>> version and the parallel version generate slightly different values of C. 
>>>  The maximum absolute difference between them is on the order of 10e-15, 
>>> but almost all values in the upper left triangle are different from each 
>>> other.
>>>
>>> 2) Taking the above computations of C and calling CS the version 
>>> computed in the absence of addprocs() and CP the version computed with 
>>> addprocs(), I get another difference.  If I have saved these matrices, open 
>>> a fresh instance of julia (no addprocs()), and read them in, both 
>>> cholfact(inv(CS)) and cholfact(inv(CP)) work fine.  If I do a fresh open, 
>>> then addprocs(), then read them in, NEITHER cholfact(inv(CS)) and 
>>> cholfact(inv(CP)) work, and they both throw the same PosDefException number.
>>>
>>>
>>> On Wednesday, July 9, 2014 5:50:07 PM UTC-5, Thomas Covert wrote:
>>>>
>>>> I have found cholfact to behave differently (erroneously?) under 
>>>> parallel processing contexts than under standard settings.  What I mean by 
>>>> "parallel processing" is simply having previously called addprocs().  Here 
>>>> is some example code that I am running on my mid-2009 MacBook Pro using a 
>>>> somewhat recent brew of @staticfloat's homebrew distribution:
>>>>
>>>> addprocs(8)
>>>>
>>>> N = 1000
>>>>
>>>> x = 10 * randn(N)
>>>>
>>>> X = zeros(N,N)
>>>>
>>>>
>>>> for i = 1:N
>>>>
>>>>     for j = 1:N
>>>>
>>>> X[i,j] = exp(-.5 * (x[i]-x[j])^2)
>>>>
>>>>     end
>>>>
>>>> end
>>>>
>>>>
>>>> X = X + diagm(.5 * ones(N))
>>>>
>>>>
>>>> C = cholfact(X)
>>>>
>>>> iC = inv(C)
>>>>
>>>> CiC = cholfact(iC)
>>>>
>>>> I believe this code generates an X which is positive definite by 
>>>> construction.
>>>>
>>>> If I run this code as-is, I get the following error (or something 
>>>> similar, the PosDefException sometimes changes):
>>>>
>>>> *ERROR: PosDefException(12)*
>>>>
>>>> * in cholfact! at linalg/factorization.jl:36*
>>>>
>>>> * in cholfact at linalg/factorization.jl:39*
>>>>
>>>> *while loading /Users/tcovert/path_to_code.jl, in expression starting 
>>>> on line 16*
>>>>  
>>>> However, if I comment out the "addprocs(8)" line, everything works 
>>>> fine.  Also, for smaller values of N the problem goes away (N=100,200 is 
>>>> fine, N=400 is not).  Here is my versioninfo() if that helps:
>>>>
>>>> *julia> **versioninfo()*
>>>>
>>>> Julia Version 0.3.0-prerelease+3868
>>>>
>>>> Commit e7a9a7d* (2014-06-24 19:39 UTC)
>>>>
>>>> Platform Info:
>>>>
>>>>   System: Darwin (x86_64-apple-darwin13.2.0)
>>>>
>>>>   CPU: Intel(R) Core(TM)2 Duo CPU     P8700  @ 2.53GHz
>>>>
>>>>   WORD_SIZE: 64
>>>>
>>>>   BLAS: libopenblas (USE64BITINT NO_AFFINITY)
>>>>
>>>>   LAPACK: libopenblas
>>>>
>>>>   LIBM: libopenlibm
>>>>
>>>>
>>
>>
>> -- 
>> Med venlig hilsen
>>
>> Andreas Noack Jensen
>>  
>

Reply via email to