Re: [julia-users] Re: Implementing mapreduce parallel model (not general multi-threading) ? easy and enough ?

Tim Holy Tue, 06 Oct 2015 13:30:07 -0700

There's

https://github.com/JuliaParallel/DistributedArrays.jl
https://github.com/JuliaParallel/HDFS.jl


in case they help. (See the other packages in JuliaParallel, in case you have 
missed that organization.)

--Tim

On Tuesday, October 06, 2015 12:57:17 PM Andrei Zh wrote:
> Yet, calling Julia processes on other machines via ssh doesn't address data
> locality. In big data systems (say, > 1TB) main performance concern is not
> a number of CPUs, but IO operations and data movement across a cluster, so
> map reduce tries to do as much as possible on local data without any
> movement (map phase) and then combine results globally (reduce phase). This
> way little program is send to data nodes instead of huge data being sent to
> program's node(s).
> 
> As far as I know, Julia doesn't provide any tools for working with huge
> distributed datasets, that's why I say it doesn't give you Hadoop- (or
> Spark-, or Google-like) map-reduce. But it's quite easy to add these
> features of MR too. E.g. one can use Elly.jl to access HDFS (including
> location of data blocks) and execute tasks using remotecall() on a Julia
> worker which is closest to data.
> 
> On Tuesday, October 6, 2015 at 8:03:57 PM UTC+3, Stefan Karpinski wrote:
> > That works fine in a distributed setting if you start Julia workers on
> > other machines, so it is actually a legitimate form of map reduce. It
> > doesn't do anything for handling machine failures, however, which was
> > arguably the major concern of the original MapReduce design.
> > 
> > On Tue, Oct 6, 2015 at 10:24 AM, Andrei Zh <[email protected]
> > 
> > <javascript:>> wrote:
> >> Julia supports multiprocessing pretty well, including map-reduce-like
> >> jobs. E.g. in the next example I add 3 processes to a "workgroup",
> >> distribute simulation between them and then reduce results via (+)
> >> operator:
> >> 
> >> 
> >> julia> addprocs(3)
> >> 
> >> 3-element Array{Int64,1}:
> >>  2
> >>  3
> >>  4
> >> 
> >> julia> nheads = @parallel (+) for i=1:200000000
> >> 
> >>          Int(rand(Bool))
> >>        
> >>        end
> >> 
> >> 100008845
> >> 
> >> You can find full example and a lot of other fun in official
> >> documentation on parallel computing:
> >> 
> >> http://julia.readthedocs.org/en/latest/manual/parallel-computing/
> >> 
> >> Note, though, that it's not real (i.e. Hadoop/Spark-like) map-reduce,
> >> since original idea of MR concerns distributed systems and data-local
> >> computations, while here we do everything on the same machine. If you are
> >> looking for big data solution, search this forum for some (dead or alive)
> >> projects for it.
> >> 
> >> On Monday, October 5, 2015 at 11:52:21 PM UTC+3, cheng wang wrote:
> >>> Hello everyone,
> >>> 
> >>> I am a Julia newbie. I am thrilled by Julia recently. It's an amazing
> >>> language!
> >>> 
> >>> I notice that julia currently does not have good support for
> >>> multi-threading programming.
> >>> So I am thinking that a spark-like mapreduce parallel model +
> >>> multi-process maybe enough.
> >>> It is easy to be thread-safe and It could solve most vector-based
> >>> computation.
> >>> 
> >>> This idea might be too naive. However, I am happy to see your opinions.
> >>> 
> >>> Thanks in advance,
> >>> Cheng

Re: [julia-users] Re: Implementing mapreduce parallel model (not general multi-threading) ? easy and enough ?

Reply via email to