Yes.
On Sunday, November 1, 2015 at 9:34:26 AM UTC-8, Jey Kottalam wrote: > > Are you asking about Spark Streaming support? > > On Sun, Nov 1, 2015 at 4:42 AM, Sisyphuss <[email protected] > <javascript:>> wrote: > >> http://dl.acm.org/citation.cfm?id=2228301 >> >> On Saturday, October 31, 2015 at 5:18:01 PM UTC+1, Jey Kottalam wrote: >>> >>> Could you please define "streams of RDDs"? >>> >>> On Sat, Oct 31, 2015 at 12:59 AM, <[email protected]> wrote: >>> >>>> Is there any implementation with streams of RDDs for Julia ? >>>> >>>> >>>> On Monday, April 20, 2015 at 11:54:10 AM UTC-7, [email protected] wrote: >>>>> >>>>> Unfortunately, Spark.jl is an incorrect RDD implementation. Instead of >>>>> creating transformations as independent abstraction operations with a >>>>> lazy >>>>> evaluation, the package has all transformations immediately executed upon >>>>> their call. This is completely undermines whole purpose of RDD as >>>>> fault-tolerant parallel data structure. >>>>> >>>>> On Saturday, April 18, 2015 at 4:04:23 AM UTC-4, Tanmay K. Mohapatra >>>>> wrote: >>>>>> >>>>>> There was some attempt made towards a pure Julia RDD in Spark.jl ( >>>>>> https://github.com/d9w/Spark.jl). >>>>>> We also have DistributedArrays ( >>>>>> https://github.com/JuliaParallel/DistributedArrays.jl), Blocks ( >>>>>> https://github.com/JuliaParallel/Blocks.jl) and ( >>>>>> https://github.com/JuliaStats/DataFrames.jl). >>>>>> >>>>>> I wonder if it is possible to leverage any of these for a pure Julia >>>>>> RDD. >>>>>> And MachineLearning.jl >>>>>> <https://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Fbenhamner%2FMachineLearning.jl&sa=D&sntz=1&usg=AFQjCNEBun6ioX809NFBqVDu3eMKWzrZBQ> >>>>>> or >>>>>> something similar could probably be the equivalent of MLib. >>>>>> >>>>>> >>>>>> On Friday, April 17, 2015 at 9:24:03 PM UTC+5:30, [email protected] >>>>>> wrote: >>>>>>> >>>>>>> Of course, a Spark data access infrastructure is unbeatable, due to >>>>>>> mature JVM-based libraries for accessing various data sources and >>>>>>> formats >>>>>>> (avro, parquet, hdfs). That includes SQL support as well. But, look at >>>>>>> Python and R bindings, these are just facades for JVM calls. MLLib is >>>>>>> written in Scala, Streaming API as well, and then all this called from >>>>>>> Python or R, all data transformations happen on JVM level. It would be >>>>>>> more >>>>>>> efficient write code in Scala then use any non-JVM bindings. Think of >>>>>>> overhead for RPC and data serialization over huge volumes of data >>>>>>> needed to >>>>>>> be processed and you'll understand why Dpark exists. BTW, machine >>>>>>> learning >>>>>>> libraries in JVM, good luck. It only works because of large >>>>>>> computational >>>>>>> resources used, but even that has its limits. >>>>>>> >>>>>>> On Thursday, April 16, 2015 at 6:29:58 PM UTC-4, Andrei Zh wrote: >>>>>>>> >>>>>>>> Julia bindings for Spark would provide much more than just RDD, >>>>>>>> they will give us access to multiple big data components for >>>>>>>> streaming, >>>>>>>> machine learning, SQL capabilities and much more. >>>>>>>> >>>>>>>> On Friday, April 17, 2015 at 12:54:32 AM UTC+3, [email protected] >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> However, I wonder, how hard it would be to implement RDD in Julia? >>>>>>>>> It looks straight forward from a RDD paper >>>>>>>>> <https://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf> >>>>>>>>> how to implement it. It is a robust abstraction that can be used in >>>>>>>>> any >>>>>>>>> parallel computation. >>>>>>>>> >>>>>>>>> On Thursday, April 16, 2015 at 3:32:32 AM UTC-4, Steven Sagaert >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> yes that's a solid approach. For my personal julia - java >>>>>>>>>> integrations I also run the JVM in a separate process. >>>>>>>>>> >>>>>>>>>> On Wednesday, April 15, 2015 at 9:30:28 PM UTC+2, >>>>>>>>>> [email protected] wrote: >>>>>>>>>>> >>>>>>>>>>> 1) simply wrap the Spark java API via JavaCall. This is the low >>>>>>>>>>>> level approach. BTW I've experimented with javaCall and found it >>>>>>>>>>>> was >>>>>>>>>>>> unstable & also lacking functionality (e.g. there's no way to >>>>>>>>>>>> shutdown the >>>>>>>>>>>> jvm or create a pool of JVM analogous to DB connections) so that >>>>>>>>>>>> might need >>>>>>>>>>>> some work before trying the Spark integration. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Using JavaCall is not an option, especially when JVM became >>>>>>>>>>> close-sourced, see https://github.com/aviks/JavaCall.jl/issues/7 >>>>>>>>>>> . >>>>>>>>>>> >>>>>>>>>>> Python bindings are done through Py4J, which is RPC to JVM. If >>>>>>>>>>> you look at the sparkR >>>>>>>>>>> <https://github.com/apache/spark/tree/master/R>, it is done in >>>>>>>>>>> a same way. sparkR uses a RPC interface to communicate with a >>>>>>>>>>> Netty-based >>>>>>>>>>> Spark JVM backend that translates R calls into JVM calls, keeps >>>>>>>>>>> SparkContext on a JVM side, and ships serialized data to/from R. >>>>>>>>>>> >>>>>>>>>>> So it is just a matter of writing Julia RPC to JVM and wrapping >>>>>>>>>>> necessary Spark methods in a Julia friendly way. >>>>>>>>>>> >>>>>>>>>> >>> >
