Please file an issue for this one. -viral
On Monday, May 18, 2015 at 11:58:06 PM UTC+5:30, Jason Morton wrote: > > Working with a 16 core / 32 thread machine with 32GB ram that presents to > ubuntu as 32 cores. I'm trying to understand how to get the best > performance for embarrassingly parallel tasks. I want to take a bunch of > svds in parallel as an example. The scaling seems to be perfect (6.6 > seconds regardless of number of svds) until about 7 or 8 simultaneous svds, > at which point it starts to creep up, scaling roughly linearly although > with high variance, up to 22 seconds for 16 and 47 seconds for 31. > > I can confirm that the number of processors being used seems to equals the > number getting pmapped over by watching htop, so I don't think openblas > multithreading is the issue. Memory usage stays low. Any guess on what is > going on? I'm using the generic linux binary julia-79599ada44. I don't > think there should be any sending of the matrices but perhaps that is the > issue. > > Probably I am missing something obvious. > > **** with nprocs = 16 **** > @time pmap(x->[svd(rand(1000,1000))[2][1] for i in 1:10],[i for i in 1:16]) > elapsed time: 22.350466328 seconds (12292776 bytes allocated) > @time map(x->[svd(rand(1000,1000))[2][1] for i in 1:10],[i for i in 1:16]) > elapsed time: 91.135322511 seconds (10269056672 bytes allocated, 2.57% gc > time) > > **** with nprocs = 31 **** > #perfect scaling until here (at 6x speedup) > @time pmap(x->[svd(rand(1000,1000))[2][1] for i in 1:10],[i for i in 1:6]) > elapsed time: 6.720786336 seconds (159168 bytes allocated) > @time map(x->[svd(rand(1000,1000))[2][1] for i in 1:10],[i for i in 1:6]) > elapsed time: 34.146665292 seconds (3847940044 bytes allocated, 2.46% gc > time) > > #4.5x speedup > @time pmap(x->[svd(rand(1000,1000))[2][1] for i in 1:10],[i for i in 1:16]) > elapsed time: 19.819358972 seconds (391056 bytes allocated) > @time map(x->[svd(rand(1000,1000))[2][1] for i in 1:10],[i for i in 1:16]) > elapsed time: 90.688842475 seconds (10260844684 bytes allocated, 2.36% gc > time) > > #3.69x speedup > @time pmap(x->[svd(rand(1000,1000))[2][1] for i in 1:10],[i for i in > 1:nprocs()]) > elapsed time: 47.411315342 seconds (738616 bytes allocated) > @time map(x->[svd(rand(1000,1000))[2][1] for i in 1:10],[i for i in > 1:nprocs()]) > elapsed time: 175.308752879 seconds (19880206220 bytes allocated, 2.34% gc > time) > > > > >
