Spark.jl provides Julia bindings for Apache Spark - by far the most popular 
computational framework in Hadoop ecosystem. Find it at:
 

https://github.com/dfdx/Spark.jl


There's still *a lot* of work to do (Spark API is *huge*), but Spark.jl 
already supports: 

   - map and reduce functions, as well as map_partitions, 
   map_partitions_with_index, count, collect and others;
   - text files on local disk and HDFS (and theoretically and 
   Hadoop-compatible file system);
   - local, Standalone and Mesos masters (YARN is quite different, though I 
   work hard to add it as well);
   - adding custom JARs and other files

See Roadmap <https://github.com/dfdx/Spark.jl/issues/1> for detailed status 
and nearest plans.

Since Spark's API is so wide and it's hard to prioritize, *I heavily 
encourage users to submit bugs and feature requests*. Fill free to open new 
issues or add +1 to push a feature. And as usual, bug reports and pull 
requests are welcome too. 

*Question to the community:* should this package be transferred to some 
Julia organization (e.g. JuliaParallel) to make it easier to discover? 

Reply via email to