This is been an idea lingering in my mind for a while. I will be very 
supportive of any effort to create a stream abstract similar in lines with feed 
(or this may not be required, if we do a major overhaul with respect to 
orchestration in falcon, where tight requirement of feed having a frequency is 
done away with) and have process work with these streams. In which case the 
orchestration should happen through Nimbus or Spark Master instead of Oozie.

In other words:
* Feed/Stream to be a primitive entity in falcon which declares that there is a 
continuous flow of data as per schema and is not bound to any arrival 
periodicity
* Replication/Mirroring on this would essentially use standard data transport 
mechanisms to ship data also on a streaming fashion
* Processes that are defined over these continuous streams are to be 
orchestrated over an appropriate engine such as Nimbus (in case of Storm) or 
similar system. Processes that are defined in this way also doesn't have 
periodicity and are continuous. 

This topic requires more conversation before we figure the way forward. Am 
assuming, more than one of us are thinking about this.

Regards
Srikanth Sundarrajan

> Date: Wed, 11 Feb 2015 15:30:47 +0530
> Subject: Re: Streaming Feed
> From: [email protected]
> To: [email protected]
> 
> Thanks Jean, this will be quite useful. I am wondering if this will require
> a new partitioning construct in the feed as well like micro-batches, etc.
> 
> Sharad
> 
> On Wed, Feb 11, 2015 at 2:34 PM, Jean-Baptiste Onofré <[email protected]>
> wrote:
> 
> > Hi Sharad,
> >
> > I sent an e-mail last week about support of Spark (SparkStreaming) in
> > workflow/process. It's basically very close to what you propose.
> >
> > IMHO, it should be a new impl of workflow or at least the support of a new
> > kind of processes (it's what I have in mind).
> >
> > Regards
> > JB
> >
> >
> > On 02/11/2015 09:38 AM, Sharad Agarwal wrote:
> >
> >> I am looking for a generic schema aware feed construct for streaming
> >> workflow. The schema can be managed by a catalog service like HCatalog.
> >> The
> >> streaming workflow executor would be a system like
> >> Storm/SparkStreaming/Samza.
> >>
> >> I want to know if this is the right thing to be supported in Falcon and if
> >> yes what is the plugging interface for that. Would this be a new
> >> implementation of workflow engine ?
> >>
> >> Thanks
> >> Sharad
> >>
> >>
> > --
> > Jean-Baptiste Onofré
> > [email protected]
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
> >
                                          

Reply via email to