There's https://github.com/JuliaParallel/DistributedArrays.jl https://github.com/JuliaParallel/HDFS.jl
in case they help. (See the other packages in JuliaParallel, in case you have missed that organization.) --Tim On Tuesday, October 06, 2015 12:57:17 PM Andrei Zh wrote: > Yet, calling Julia processes on other machines via ssh doesn't address data > locality. In big data systems (say, > 1TB) main performance concern is not > a number of CPUs, but IO operations and data movement across a cluster, so > map reduce tries to do as much as possible on local data without any > movement (map phase) and then combine results globally (reduce phase). This > way little program is send to data nodes instead of huge data being sent to > program's node(s). > > As far as I know, Julia doesn't provide any tools for working with huge > distributed datasets, that's why I say it doesn't give you Hadoop- (or > Spark-, or Google-like) map-reduce. But it's quite easy to add these > features of MR too. E.g. one can use Elly.jl to access HDFS (including > location of data blocks) and execute tasks using remotecall() on a Julia > worker which is closest to data. > > On Tuesday, October 6, 2015 at 8:03:57 PM UTC+3, Stefan Karpinski wrote: > > That works fine in a distributed setting if you start Julia workers on > > other machines, so it is actually a legitimate form of map reduce. It > > doesn't do anything for handling machine failures, however, which was > > arguably the major concern of the original MapReduce design. > > > > On Tue, Oct 6, 2015 at 10:24 AM, Andrei Zh <[email protected] > > > > <javascript:>> wrote: > >> Julia supports multiprocessing pretty well, including map-reduce-like > >> jobs. E.g. in the next example I add 3 processes to a "workgroup", > >> distribute simulation between them and then reduce results via (+) > >> operator: > >> > >> > >> julia> addprocs(3) > >> > >> 3-element Array{Int64,1}: > >> 2 > >> 3 > >> 4 > >> > >> julia> nheads = @parallel (+) for i=1:200000000 > >> > >> Int(rand(Bool)) > >> > >> end > >> > >> 100008845 > >> > >> You can find full example and a lot of other fun in official > >> documentation on parallel computing: > >> > >> http://julia.readthedocs.org/en/latest/manual/parallel-computing/ > >> > >> Note, though, that it's not real (i.e. Hadoop/Spark-like) map-reduce, > >> since original idea of MR concerns distributed systems and data-local > >> computations, while here we do everything on the same machine. If you are > >> looking for big data solution, search this forum for some (dead or alive) > >> projects for it. > >> > >> On Monday, October 5, 2015 at 11:52:21 PM UTC+3, cheng wang wrote: > >>> Hello everyone, > >>> > >>> I am a Julia newbie. I am thrilled by Julia recently. It's an amazing > >>> language! > >>> > >>> I notice that julia currently does not have good support for > >>> multi-threading programming. > >>> So I am thinking that a spark-like mapreduce parallel model + > >>> multi-process maybe enough. > >>> It is easy to be thread-safe and It could solve most vector-based > >>> computation. > >>> > >>> This idea might be too naive. However, I am happy to see your opinions. > >>> > >>> Thanks in advance, > >>> Cheng
