I want to process each line of a large text file (100G) in parallel using
the following code
pmap(process_fun, eachline(the_file))
however, it seems that pmap is slow. following is a dummy experiment:
julia> writedlm("tmp.txt",rand(100000,100)) # produce a large file
julia> @time for l in eachline("tmp.txt")
split(l)
end
5.678517 seconds (11.00 M allocations: 732.637 MB, 40.67% gc time)
julia> addprocs() # 32 core
julia> @time map(split, eachline("tmp.txt"));
4.834571 seconds (11.00 M allocations: 734.638 MB, 32.84% gc time)
julia> @time pmap(split, eachline("tmp.txt"));
112.275411 seconds (227.06 M allocations: 10.024 GB, 50.72% gc time)
the goal is to process those files (300+) as fast as possible. and maybe
there are better ways to call pmap?