was curious about this question myself.  here is my versioninfo() for a 
recent iMac with an i7:

Julia Version 0.4.5

Commit 2ac304d* (2016-03-18 00:58 UTC)

Platform Info:

  System: Darwin (x86_64-apple-darwin14.5.0)

  CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz

  WORD_SIZE: 64

  BLAS: libopenblas (DYNAMIC_ARCH NO_AFFINITY Haswell)

  LAPACK: libopenblas

  LIBM: libopenlibm

  LLVM: libLLVM-3.3


I observe a doubling of performance from blas_set_num_threads(1) to 2, but 
no change in performance when moving to 4, or 8, even though intel claims 
that this machine has 4 cores and can do 8 threads.  


julia> blas_set_num_threads(1)


julia> [peakflops(8000)::Float64 for i in 1:6]

6-element Array{Float64,1}:

 5.42466e10

 5.36131e10

 5.38872e10

 5.44293e10

 5.41495e10

 5.43369e10


julia> blas_set_num_threads(2)


julia> [peakflops(8000)::Float64 for i in 1:6]

6-element Array{Float64,1}:

 1.0666e11 

 1.04197e11

 1.05386e11

 1.06687e11

 1.05155e11

 1.07008e11


julia> blas_set_num_threads(4)


julia> [peakflops(8000)::Float64 for i in 1:6]

6-element Array{Float64,1}:

 1.0736e11 

 1.05142e11

 1.07173e11

 1.07557e11

 1.07503e11

 1.07559e11


julia> blas_set_num_threads(8)


julia> [peakflops(8000)::Float64 for i in 1:6]

6-element Array{Float64,1}:

 1.07168e11

 1.07027e11

 1.02708e11

 1.0297e11 

 1.0579e11 

 1.06693e11

I am using the most recent homebrew install on a mac.

On Friday, April 10, 2015 at 12:58:59 PM UTC-5, [email protected] wrote:
>
> For the record, with this architecture:
>
>   System: Linux (x86_64-linux-gnu)
>   CPU: Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
>
> And 20 physical cores,  peakflops(n) continues to increase until about n = 
> 20000.
> peakflops also scales well increasing the number of blas threads from 1 to 
> 20.
>
> Hyper-threading is on so there are 40 logical cores. If you do 
> blas_set_num_threads(21), peakflops
> returns something a bit greater than 1/2 of the value for 20 cores. I 
> don't recall the exact number;
> I hardcoded 20 cores in deps/Makefile.
>
> On Thursday, December 4, 2014 at 8:30:39 PM UTC+1, Douglas Bates wrote:
>>
>> I have been working on a package 
>> https://github.com/dmbates/ParalllelGLM.jl and noticed some 
>> peculiarities in the timings on a couple of shared-memory servers, each 
>> with 32 cores.  In particular changing from 16 workers to 32 workers 
>> actually slowed down the fitting process.  So I decided to check how 
>> changing the number of OpenBLAS threads affected the peakflops() result.  I 
>> end up with essentially the same results for 8, 16 and 32 threads on this 
>> machine with 32 cores.  Is that to be expected?
>>
>>    _       _ _(_)_     |  A fresh approach to technical computing
>>   (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
>>    _ _   _| |_  __ _   |  Type "help()" for help.
>>   | | | | | | |/ _` |  |
>>   | | |_| | | | (_| |  |  Version 0.4.0-dev+1944 (2014-12-04 15:06 UTC)
>>  _/ |\__'_|_|_|\__'_|  |  Commit 87e9ee1* (0 days old master)
>> |__/                   |  x86_64-unknown-linux-gnu
>>
>> julia> [peakflops()::Float64 for i in 1:6]
>> 6-element Array{Float64,1}:
>>  1.41151e11
>>  1.1676e11 
>>  1.27597e11
>>  1.27607e11
>>  1.27518e11
>>  1.27478e11
>>
>> julia> CPU_CORES
>> 32
>>
>> julia> blas_set_num_threads(16)
>>
>> julia> [peakflops()::Float64 for i in 1:6]
>> 6-element Array{Float64,1}:
>>  1.23523e11
>>  1.27119e11
>>  1.11381e11
>>  1.17847e11
>>  1.28415e11
>>  1.17998e11
>>
>> julia> blas_set_num_threads(8)
>>
>> julia> [peakflops()::Float64 for i in 1:6]
>> 6-element Array{Float64,1}:
>>  1.25194e11
>>  1.20969e11
>>  1.25777e11
>>  1.20757e11
>>  1.26086e11
>>  1.20958e11
>>
>> julia> versioninfo(true)
>> Julia Version 0.4.0-dev+1944
>> Commit 87e9ee1* (2014-12-04 15:06 UTC)
>> Platform Info:
>>   System: Linux (x86_64-unknown-linux-gnu)
>>   CPU: AMD Opteron(tm) Processor 6328                 
>>   WORD_SIZE: 64
>>            "Red Hat Enterprise Linux Server release 6.5 (Santiago)"
>>   uname: Linux 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Dec 13 06:58:20 EST 
>> 2013 x86_64 x86_64
>> Memory: 504.78467178344727 GB (508598.8125 MB free)
>> Uptime: 261586.0 sec
>> Load Avg:  0.08740234375  0.19384765625  0.8330078125
>> AMD Opteron(tm) Processor 6328                 : 
>>           speed         user         nice          sys         idle       
>>    irq
>> #1-32  3199 MHz    1855973 s      23392 s     670932 s  834073187 s       
>>   21 s
>>
>>   BLAS: libopenblas (USE64BITINT NO_AFFINITY PILEDRIVER)
>>   LAPACK: libopenblas
>>   LIBM: libopenlibm
>>   LLVM: libLLVM-3.5.0
>> Environment:
>>   TERM = screen
>>   PATH = 
>> /s/cmake-3.0.2/bin:/s/gcc-4.9.2/bin:./u/b/a/bates/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/s/std/bin:/usr/afsws/bin:
>>   WWW_HOME = http://www.stat.wisc.edu/
>>   JULIA_PKGDIR = /scratch/bates/.julia
>>   HOME = /u/b/a/bates
>>
>> Package Directory: /scratch/bates/.julia/v0.4
>> 2 required packages:
>>  - Distributions                 0.6.1
>>  - Docile                        0.3.2
>> 5 additional packages:
>>  - ArrayViews                    0.4.8
>>  - Compat                        0.2.5
>>  - PDMats                        0.3.1
>>  - ParallelGLM                   0.0.0-             master (unregistered)
>>  - StatsBase                     0.6.10
>>
>>
>>
>>

Reply via email to