Re: Newbie question

Jean-Baptiste Onofré Sat, 12 Mar 2016 21:23:04 -0800

Hi Yash,

Beam is a SDK, so it runs on an existing cluster.


You design jobs as pipeline: it's a "programming model".

For your late data arrival issues, maybe Falcon can help there.

Regards
JB

On 03/13/2016 03:31 AM, Yash Sharma wrote:

Hi All,
I have been recently reading about Apache Beam and am interested in
exploring how it fits into our stack.

We currently have our hive and spark pipelines. We have the late data
arrival issues and have to reprocess couple of steps to ensure the data is
consumed.

Couple of questions on top of my mind are -

1. Does Beam use the existing cluster or needs its own cluster ?
2. How Beam fits with the existing Hive and Spark jobs ? What changes might
be required in the jobs for starting with Beam ?

Best,
Yash


--
Jean-Baptiste Onofré
[email protected]
http://blog.nanthrax.net
Talend - http://www.talend.com

Re: Newbie question

Reply via email to