Hi Jean. I prepared a draft version of my proposal which you can find here <https://docs.google.com/document/d/1KaBKxYbR08pgwv3UfPF-SMiRM2VJ7K4AQiLzzfUd138/edit?usp=sharing>. Could you please have a look and give comments on how to improve it.
Thanks and regards On Thu, Mar 10, 2016 at 3:05 PM, Jean-Baptiste Onofré <[email protected]> wrote: > Interesting, it makes sense. > > Thanks for sharing ! > > Regards > JB > > On 03/10/2016 10:32 AM, Minudika Malshan wrote: > >> Hi JB, >> >> Thanks a lot for your kind attention. I'm very happy to take your advises >> on this implementation. :) >> >> I am planning to do this for GSOC 2016 since it has been published as a >> project idea in this year. >> Here is the plan in brief. >> >> The user should be able to implement the pipelines using commands provided >> by the beam sdk (dataflow sdk) using a zeppelin notebook. >> Then the beam interpreter should be able to interpret and execute beam sdk >> commands at the back-end and give the output. >> Since beam provides only a sdk for java, I am going to use Java-REPL >> <https://github.com/albertlatacz/java-repl> to interpret java commands >> >> provided by sdk at the zeppelin back-end. >> >> I will create a draft proposal for this implementation and share it with >> you. Would like to have your comments on it. >> >> Thanks and regards. >> Minudika >> >> >> Minudika Malshan >> Undergraduate >> Department of Computer Science and Engineering >> University of Moratuwa >> Sri Lanka. >> >> >> >> >> On Thu, Mar 10, 2016 at 2:39 PM, Jean-Baptiste Onofré <[email protected]> >> wrote: >> >> Hi Minudika, >>> >>> Oh, interesting for Zeppelin. What do you plan to do ? Implement the >>> zeppelin notebook backend with Beam (the zeppelin analytics would be >>> implemented as beam pipelines) ? I would be happy to help if you need. >>> >>> Regards >>> JB >>> >>> >>> On 03/10/2016 09:47 AM, Minudika Malshan wrote: >>> >>> Hi, >>>> >>>> This is related with the implementation of a beam interpreter for Apache >>>> zeppelin. I think for the first phase, DirectPipelineRunner will do the >>>> job >>>> :) >>>> Please let me know if there is anything which can be helpful. >>>> >>>> Thanks and regards. >>>> Minudika >>>> >>>> Minudika Malshan >>>> Undergraduate >>>> Department of Computer Science and Engineering >>>> University of Moratuwa >>>> Sri Lanka. >>>> >>>> >>>> >>>> >>>> On Thu, Mar 10, 2016 at 12:11 PM, Jean-Baptiste Onofré <[email protected] >>>> > >>>> wrote: >>>> >>>> By the way, on my side, I will work on a Karaf/OSGi ( >>>> >>>>> http://karaf.apache.org) runner for Beam (with shell commands, >>>>> features, >>>>> etc). >>>>> I will start it just after the work on new IOs. >>>>> >>>>> Regards >>>>> JB >>>>> >>>>> >>>>> On 03/09/2016 08:01 PM, Minudika Malshan wrote: >>>>> >>>>> Hi, >>>>> >>>>>> >>>>>> Thanks a lot for your quick responses. >>>>>> I will refer those resources. >>>>>> >>>>>> Regards, >>>>>> Minudika >>>>>> >>>>>> Minudika Malshan >>>>>> Undergraduate >>>>>> Department of Computer Science and Engineering >>>>>> University of Moratuwa >>>>>> Sri Lanka. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Mar 10, 2016 at 12:24 AM, Lukasz Cwik >>>>>> <[email protected] >>>>>> >>>>>>> >>>>>>> wrote: >>>>>> >>>>>> There are currently two implementations which do not require the >>>>>> cloud: >>>>>> >>>>>> >>>>>>> The DirectPipelineRunner >>>>>>> < >>>>>>> >>>>>>> >>>>>>> >>>>>>> https://github.com/apache/incubator-beam/blob/master/sdk/src/main/java/com/google/cloud/dataflow/sdk/runners/DirectPipelineRunner.java >>>>>>> >>>>>>> >>>>>>> which is mainly used for testing and local development. This runner >>>>>>>> has >>>>>>>> >>>>>>>> several limits (data size, no support for unbounded collections, >>>>>>> ...) >>>>>>> and >>>>>>> is being expanded to support more use cases, for example adding >>>>>>> unbounded >>>>>>> PCollection support <https://issues.apache.org/jira/browse/BEAM-22>. >>>>>>> >>>>>>> The FlinkPipelineRunner >>>>>>> <https://github.com/apache/incubator-beam/tree/master/runners/flink> >>>>>>> which >>>>>>> can be used to execute locally or on a Flink cluster. >>>>>>> >>>>>>> There is also ongoing work to bring Spark >>>>>>> <https://issues.apache.org/jira/browse/BEAM-6> into the mix as a >>>>>>> runner >>>>>>> and >>>>>>> suggestions to for other runners such as GearPump >>>>>>> <https://github.com/gearpump/gearpump>. >>>>>>> >>>>>>> On Wed, Mar 9, 2016 at 10:37 AM, Minudika Malshan < >>>>>>> [email protected] >>>>>>> >>>>>>> >>>>>>>> wrote: >>>>>>>> >>>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> >>>>>>>> As per my knowledge about Apache beam and data flow sdk, at the >>>>>>>> first >>>>>>>> >>>>>>>> data >>>>>>>> >>>>>>> >>>>>>> flow sdk has been developed targeting google cloud platform. >>>>>>> >>>>>>>> So we have to deploy pipelines in the cloud. >>>>>>>> >>>>>>>> But my question is, can not we use this sdk for standalone >>>>>>>> >>>>>>>> implementations >>>>>>>> >>>>>>> >>>>>>> without cloud. If so, I would love to have a look at some examples of >>>>>>> >>>>>>>> >>>>>>>> such >>>>>>>> >>>>>>> >>>>>>> implementations. >>>>>>> >>>>>>>> Your kind help is much appreciated. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Minudika >>>>>>>> >>>>>>>> Minudika Malshan >>>>>>>> Undergraduate >>>>>>>> Department of Computer Science and Engineering >>>>>>>> University of Moratuwa >>>>>>>> Sri Lanka. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> -- >>>>>> >>>>> Jean-Baptiste Onofré >>>>> [email protected] >>>>> http://blog.nanthrax.net >>>>> Talend - http://www.talend.com >>>>> >>>>> >>>>> >>>> -- >>> Jean-Baptiste Onofré >>> [email protected] >>> http://blog.nanthrax.net >>> Talend - http://www.talend.com >>> >>> >> > -- > Jean-Baptiste Onofré > [email protected] > http://blog.nanthrax.net > Talend - http://www.talend.com > -- *Minudika Malshan* Undergraduate Department of Computer Science and Engineering University of Moratuwa Sri Lanka. <https://lk.linkedin.com/pub/minudika-malshan/100/656/a80>
