It seems to work fine on 0.4. On my dual core i5: julia> peakflops() 6.3990880531633675e10
julia> blas_set_num_threads(1) julia> peakflops() 3.2582507660855206e10 -viral On Friday, March 6, 2015 at 10:27:46 PM UTC+5:30, Steven G. Johnson wrote: > > For my numerics class at MIT <http://math.mit.edu/~stevenj/18.335/>, I > used the following notebook to talk about cache effects and matrix > multiplication: > > > http://nbviewer.ipython.org/url/math.mit.edu/~stevenj/18.335/Matrix-multiplication-experiments.ipynb > > It includes some code to benchmark the built-in BLAS-based multiplication > against some simpler algorithms, and for comparison purposes I used > blas_set_num_threads(1) to benchmark only serial performance... I thought. > > When I ran the benchmark on my desktop, the results made sense: OpenBLAS > got about 3 * 4 Gflops, which is peak performance for a 3GHz CPU that can > perform 4 flops per cycle (via 256-bit AVX instructions). However, on my > laptop, it got about 40 gigaflops, which only makes sense if it was using > additional cores. In both cases, this was with Julia 0.4 using OpenBLAS. > > Is there any reason why blas_set_num_threads(1) would not be sufficient to > disable additional cores? > > --SGJ >
