Re: Using beam sdk for standalone implementations without connecting to the cloud

Minudika Malshan Sat, 12 Mar 2016 12:14:49 -0800

Hi Jean.

I prepared a draft version of my proposal which you can find here
<https://docs.google.com/document/d/1KaBKxYbR08pgwv3UfPF-SMiRM2VJ7K4AQiLzzfUd138/edit?usp=sharing>.
Could you please have a look and give comments on how to improve it.


Thanks and regards

On Thu, Mar 10, 2016 at 3:05 PM, Jean-Baptiste Onofré <[email protected]>
wrote:

> Interesting, it makes sense.
>
> Thanks for sharing !
>
> Regards
> JB
>
> On 03/10/2016 10:32 AM, Minudika Malshan wrote:
>
>> Hi JB,
>>
>> Thanks a lot for your kind attention. I'm very happy to take your advises
>> on this implementation. :)
>>
>> I am planning to do this for GSOC 2016 since it has been published as a
>> project idea in this year.
>> Here is the plan in brief.
>>
>> The user should be able to implement the pipelines using commands provided
>> by the beam sdk (dataflow sdk) using a zeppelin notebook.
>> Then the beam interpreter should be able to interpret and execute beam sdk
>> commands at the back-end and give the output.
>> Since beam provides only a sdk for java, I am going to use Java-REPL
>> <https://github.com/albertlatacz/java-repl> to interpret java commands
>>
>> provided by sdk at the zeppelin back-end.
>>
>> I will create a draft proposal for this implementation and share it with
>> you. Would like to have your comments on it.
>>
>> Thanks and regards.
>> Minudika
>>
>>
>> Minudika Malshan
>> Undergraduate
>> Department of Computer Science and Engineering
>> University of Moratuwa
>> Sri Lanka.
>>
>>
>>
>>
>> On Thu, Mar 10, 2016 at 2:39 PM, Jean-Baptiste Onofré <[email protected]>
>> wrote:
>>
>> Hi Minudika,
>>>
>>> Oh, interesting for Zeppelin. What do you plan to do ? Implement the
>>> zeppelin notebook backend with Beam (the zeppelin analytics would be
>>> implemented as beam pipelines) ? I would be happy to help if you need.
>>>
>>> Regards
>>> JB
>>>
>>>
>>> On 03/10/2016 09:47 AM, Minudika Malshan wrote:
>>>
>>> Hi,
>>>>
>>>> This is related with the implementation of a beam interpreter for Apache
>>>> zeppelin. I think for the first phase, DirectPipelineRunner will do the
>>>> job
>>>> :)
>>>> Please let me know if there is anything which can be helpful.
>>>>
>>>> Thanks and regards.
>>>> Minudika
>>>>
>>>> Minudika Malshan
>>>> Undergraduate
>>>> Department of Computer Science and Engineering
>>>> University of Moratuwa
>>>> Sri Lanka.
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Mar 10, 2016 at 12:11 PM, Jean-Baptiste Onofré <[email protected]
>>>> >
>>>> wrote:
>>>>
>>>> By the way, on my side, I will work on a Karaf/OSGi (
>>>>
>>>>> http://karaf.apache.org) runner for Beam (with shell commands,
>>>>> features,
>>>>> etc).
>>>>> I will start it just after the work on new IOs.
>>>>>
>>>>> Regards
>>>>> JB
>>>>>
>>>>>
>>>>> On 03/09/2016 08:01 PM, Minudika Malshan wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>>>
>>>>>> Thanks a lot for your quick responses.
>>>>>> I will refer those resources.
>>>>>>
>>>>>> Regards,
>>>>>> Minudika
>>>>>>
>>>>>> Minudika Malshan
>>>>>> Undergraduate
>>>>>> Department of Computer Science and Engineering
>>>>>> University of Moratuwa
>>>>>> Sri Lanka.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Mar 10, 2016 at 12:24 AM, Lukasz Cwik
>>>>>> <[email protected]
>>>>>>
>>>>>>>
>>>>>>> wrote:
>>>>>>
>>>>>> There are currently two implementations which do not require the
>>>>>> cloud:
>>>>>>
>>>>>>
>>>>>>> The DirectPipelineRunner
>>>>>>> <
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> https://github.com/apache/incubator-beam/blob/master/sdk/src/main/java/com/google/cloud/dataflow/sdk/runners/DirectPipelineRunner.java
>>>>>>>
>>>>>>>
>>>>>>> which is mainly used for testing and local development. This runner
>>>>>>>> has
>>>>>>>>
>>>>>>>> several limits (data size, no support for unbounded collections,
>>>>>>> ...)
>>>>>>> and
>>>>>>> is being expanded to support more use cases, for example adding
>>>>>>> unbounded
>>>>>>> PCollection support <https://issues.apache.org/jira/browse/BEAM-22>.
>>>>>>>
>>>>>>> The FlinkPipelineRunner
>>>>>>> <https://github.com/apache/incubator-beam/tree/master/runners/flink>
>>>>>>> which
>>>>>>> can be used to execute locally or on a Flink cluster.
>>>>>>>
>>>>>>> There is also ongoing work to bring Spark
>>>>>>> <https://issues.apache.org/jira/browse/BEAM-6> into the mix as a
>>>>>>> runner
>>>>>>> and
>>>>>>> suggestions to for other runners such as GearPump
>>>>>>> <https://github.com/gearpump/gearpump>.
>>>>>>>
>>>>>>> On Wed, Mar 9, 2016 at 10:37 AM, Minudika Malshan <
>>>>>>> [email protected]
>>>>>>>
>>>>>>>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>>
>>>>>>>> As per my knowledge about Apache beam and data flow sdk,  at the
>>>>>>>> first
>>>>>>>>
>>>>>>>> data
>>>>>>>>
>>>>>>>
>>>>>>> flow sdk has been developed targeting google cloud platform.
>>>>>>>
>>>>>>>> So we have to deploy pipelines in the cloud.
>>>>>>>>
>>>>>>>> But my question is, can not we use this sdk for standalone
>>>>>>>>
>>>>>>>> implementations
>>>>>>>>
>>>>>>>
>>>>>>> without cloud. If so, I would love to have a look at some examples of
>>>>>>>
>>>>>>>>
>>>>>>>> such
>>>>>>>>
>>>>>>>
>>>>>>> implementations.
>>>>>>>
>>>>>>>> Your kind help is much appreciated.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Minudika
>>>>>>>>
>>>>>>>> Minudika Malshan
>>>>>>>> Undergraduate
>>>>>>>> Department of Computer Science and Engineering
>>>>>>>> University of Moratuwa
>>>>>>>> Sri Lanka.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> --
>>>>>>
>>>>> Jean-Baptiste Onofré
>>>>> [email protected]
>>>>> http://blog.nanthrax.net
>>>>> Talend - http://www.talend.com
>>>>>
>>>>>
>>>>>
>>>> --
>>> Jean-Baptiste Onofré
>>> [email protected]
>>> http://blog.nanthrax.net
>>> Talend - http://www.talend.com
>>>
>>>
>>
> --
> Jean-Baptiste Onofré
> [email protected]
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>



-- 
*Minudika Malshan*
Undergraduate
Department of Computer Science and Engineering
University of Moratuwa
Sri Lanka.
<https://lk.linkedin.com/pub/minudika-malshan/100/656/a80>

Re: Using beam sdk for standalone implementations without connecting to the cloud

Reply via email to