Yes, but I am not using BLAS or FFT transforms so it si a bit surprising that I am not getting any speed improvements
On Wednesday, July 6, 2016 at 2:17:49 AM UTC+2, Stefan Karpinski wrote: > > Similar question and answer: > http://stackoverflow.com/questions/38075163/julia-uses-only-20-30-of-my-cpu-what-should-i-do/38075939 > . > > On Tue, Jul 5, 2016 at 11:26 AM, <[email protected] <javascript:>> wrote: > >> I am a complete newcomer to Julia and trying to port some of my R code to >> it; >> Basically I have rewritten the following R code in Julia: >> >> library(parallel) >> >> eps_1<-rnorm(1000000) >> eps_2<-rnorm(1000000) >> >> large_matrix<-ifelse(cbind(eps_1,eps_2)>0,1,0) >> matrix_to_compare = expand.grid(c(0,1),c(0,1)) >> indices<-seq(1,1000000,4) >> large_matrix<-lapply(indices,function(i)(large_matrix[i:(i+3),])) >> >> function_compare<-function(x){ >> which((rowSums(x==matrix_to_compare)==2) %in% TRUE) >> } >> >> > system.time(lapply(large_matrix,function_compare)) >> user system elapsed >> 38.812 0.024 38.828 >> > system.time(mclapply(large_matrix,function_compare,mc.cores=11)) >> user system elapsed >> 63.128 1.648 6.108 >> >> As one can notice I am getting significant speed-up when going from one >> core to 11. Now I am trying to do the same in Julia: >> >> using Distributions; >> @everywhere using Iterators; >> d = Normal(); >> >> eps_1 = rand(d,1000000); >> eps_2 = rand(d,1000000); >> >> #Define cluster: >> addprocs(11); >> >> #Create a large matrix: >> large_matrix = hcat(eps_1,eps_2).>=0; >> indices = collect(1:4:1000000) >> >> #Split large matrix: >> large_matrix = [large_matrix[i:(i+3),:] for i in indices]; >> >> #Define the function to apply: >> @everywhere function function_split(x) >> matrix_to_compare = >> transpose(reinterpret(Int,collect(product([0,1],[0,1])),(2,4))); >> matrix_to_compare = matrix_to_compare.>0; >> find(sum(x.==matrix_to_compare,2).==2) >> end >> >> @time map(function_split,large_matrix ) >> @time pmap(function_split,large_matrix ) >> 5.167820 seconds (22.00 M allocations: 2.899 GB, 12.83% gc time) >> 18.569198 seconds (40.34 M allocations: 2.082 GB, 5.71% gc time) >> >> I somehow do not understand why parallel map function does not work for >> me. Maybe somebody can point me to a correct solution. >> > >
