was curious about this question myself. here is my versioninfo() for a
recent iMac with an i7:
Julia Version 0.4.5
Commit 2ac304d* (2016-03-18 00:58 UTC)
Platform Info:
System: Darwin (x86_64-apple-darwin14.5.0)
CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz
WORD_SIZE: 64
BLAS: libopenblas (DYNAMIC_ARCH NO_AFFINITY Haswell)
LAPACK: libopenblas
LIBM: libopenlibm
LLVM: libLLVM-3.3
I observe a doubling of performance from blas_set_num_threads(1) to 2, but
no change in performance when moving to 4, or 8, even though intel claims
that this machine has 4 cores and can do 8 threads.
julia> blas_set_num_threads(1)
julia> [peakflops(8000)::Float64 for i in 1:6]
6-element Array{Float64,1}:
5.42466e10
5.36131e10
5.38872e10
5.44293e10
5.41495e10
5.43369e10
julia> blas_set_num_threads(2)
julia> [peakflops(8000)::Float64 for i in 1:6]
6-element Array{Float64,1}:
1.0666e11
1.04197e11
1.05386e11
1.06687e11
1.05155e11
1.07008e11
julia> blas_set_num_threads(4)
julia> [peakflops(8000)::Float64 for i in 1:6]
6-element Array{Float64,1}:
1.0736e11
1.05142e11
1.07173e11
1.07557e11
1.07503e11
1.07559e11
julia> blas_set_num_threads(8)
julia> [peakflops(8000)::Float64 for i in 1:6]
6-element Array{Float64,1}:
1.07168e11
1.07027e11
1.02708e11
1.0297e11
1.0579e11
1.06693e11
I am using the most recent homebrew install on a mac.
On Friday, April 10, 2015 at 12:58:59 PM UTC-5, [email protected] wrote:
>
> For the record, with this architecture:
>
> System: Linux (x86_64-linux-gnu)
> CPU: Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
>
> And 20 physical cores, peakflops(n) continues to increase until about n =
> 20000.
> peakflops also scales well increasing the number of blas threads from 1 to
> 20.
>
> Hyper-threading is on so there are 40 logical cores. If you do
> blas_set_num_threads(21), peakflops
> returns something a bit greater than 1/2 of the value for 20 cores. I
> don't recall the exact number;
> I hardcoded 20 cores in deps/Makefile.
>
> On Thursday, December 4, 2014 at 8:30:39 PM UTC+1, Douglas Bates wrote:
>>
>> I have been working on a package
>> https://github.com/dmbates/ParalllelGLM.jl and noticed some
>> peculiarities in the timings on a couple of shared-memory servers, each
>> with 32 cores. In particular changing from 16 workers to 32 workers
>> actually slowed down the fitting process. So I decided to check how
>> changing the number of OpenBLAS threads affected the peakflops() result. I
>> end up with essentially the same results for 8, 16 and 32 threads on this
>> machine with 32 cores. Is that to be expected?
>>
>> _ _ _(_)_ | A fresh approach to technical computing
>> (_) | (_) (_) | Documentation: http://docs.julialang.org
>> _ _ _| |_ __ _ | Type "help()" for help.
>> | | | | | | |/ _` | |
>> | | |_| | | | (_| | | Version 0.4.0-dev+1944 (2014-12-04 15:06 UTC)
>> _/ |\__'_|_|_|\__'_| | Commit 87e9ee1* (0 days old master)
>> |__/ | x86_64-unknown-linux-gnu
>>
>> julia> [peakflops()::Float64 for i in 1:6]
>> 6-element Array{Float64,1}:
>> 1.41151e11
>> 1.1676e11
>> 1.27597e11
>> 1.27607e11
>> 1.27518e11
>> 1.27478e11
>>
>> julia> CPU_CORES
>> 32
>>
>> julia> blas_set_num_threads(16)
>>
>> julia> [peakflops()::Float64 for i in 1:6]
>> 6-element Array{Float64,1}:
>> 1.23523e11
>> 1.27119e11
>> 1.11381e11
>> 1.17847e11
>> 1.28415e11
>> 1.17998e11
>>
>> julia> blas_set_num_threads(8)
>>
>> julia> [peakflops()::Float64 for i in 1:6]
>> 6-element Array{Float64,1}:
>> 1.25194e11
>> 1.20969e11
>> 1.25777e11
>> 1.20757e11
>> 1.26086e11
>> 1.20958e11
>>
>> julia> versioninfo(true)
>> Julia Version 0.4.0-dev+1944
>> Commit 87e9ee1* (2014-12-04 15:06 UTC)
>> Platform Info:
>> System: Linux (x86_64-unknown-linux-gnu)
>> CPU: AMD Opteron(tm) Processor 6328
>> WORD_SIZE: 64
>> "Red Hat Enterprise Linux Server release 6.5 (Santiago)"
>> uname: Linux 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Dec 13 06:58:20 EST
>> 2013 x86_64 x86_64
>> Memory: 504.78467178344727 GB (508598.8125 MB free)
>> Uptime: 261586.0 sec
>> Load Avg: 0.08740234375 0.19384765625 0.8330078125
>> AMD Opteron(tm) Processor 6328 :
>> speed user nice sys idle
>> irq
>> #1-32 3199 MHz 1855973 s 23392 s 670932 s 834073187 s
>> 21 s
>>
>> BLAS: libopenblas (USE64BITINT NO_AFFINITY PILEDRIVER)
>> LAPACK: libopenblas
>> LIBM: libopenlibm
>> LLVM: libLLVM-3.5.0
>> Environment:
>> TERM = screen
>> PATH =
>> /s/cmake-3.0.2/bin:/s/gcc-4.9.2/bin:./u/b/a/bates/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/s/std/bin:/usr/afsws/bin:
>> WWW_HOME = http://www.stat.wisc.edu/
>> JULIA_PKGDIR = /scratch/bates/.julia
>> HOME = /u/b/a/bates
>>
>> Package Directory: /scratch/bates/.julia/v0.4
>> 2 required packages:
>> - Distributions 0.6.1
>> - Docile 0.3.2
>> 5 additional packages:
>> - ArrayViews 0.4.8
>> - Compat 0.2.5
>> - PDMats 0.3.1
>> - ParallelGLM 0.0.0- master (unregistered)
>> - StatsBase 0.6.10
>>
>>
>>
>>