On Fri, Jan 16, 2015 at 7:22 AM, Scott Preece <[email protected]> wrote:
> Coming to this from the perspective of knowing the architecture of Falcon > only in rough terms and having no experience with Quartz and very little > with Oozie, I've got a few questions about the proposal: > - Looking at the Quartz site it talks about time-based triggers, but not > about triggering on availability of either files or other resources; does > it do that? It would be good to test your list of use cases against the > proposed solution. > The Triggers are quite extensible and we need to add that. We could borrow code from Oozie for data-availability triggers. > - How would Quartz integrate with Oozie, assuming Oozie is still doing the > workflow execution? By JMS messages? > You could invoke a workflow execution directly with Oozie. > - Is triggering by external events/messages, as opposed to by its own > scheduler, a natural mode for Oozie (are there interfaces at the right > level)? > It does support directly invoking oozie workflows without tying it to a scheduler. > - Is Oozie decomposable so that it would be reasonable to only include the > execution parts and not the scheduling parts? > Thats quite hard from what I know. > regards,scott > > On Friday, January 16, 2015 12:37 AM, Siva Thumma < > [email protected]> wrote: > > > Going for confluence page is upto the core team. For my knowledge this is > too early. > This is broader as we are not quite clear of the complete usecases this > development would solve, if incorporated from scratch, away from oozie. > > > > On 16-Jan-2015, at 7:43 am, Srikanth Sundarrajan <[email protected]> > wrote: > > > > This is a very important decision for the project and if we need to > discuss this more, we should and not rush through. So I will hold of any > further action in terms pressing forward with the design. > > > > Here is the consolidations of views expressed so far on this thread. > Folks who have responded, please chime in if I have misrepresented any one. > > > > Sanjeev: Agreed with the proposal > > Ajay: Agreed with the proposal and wanted to know how it will be > implemented > > Siva Tumma: -1, as repeating some functionality in Oozie seemed wasteful > > Venkatesh: -1 initially to this being built in Falcon, but ok with > leveraging capabilities through alternate scheduler such as Quartz/Yarn. > Subsequently expressed how chugging along with Oozie is not ideal in the > long run > > Shwetha: Ok with replacing Oozie altogehter including workflow > execution. She felt that some of these may exist in Oozie and yet to revert > if they really are. > > JB: Initially had reservations to repeating functionality in Falcon, > later +1 > > Shaik: Agreed to the proposal, additionally calling out more > capabilitiies than was originally called out in the initial thread. > > Srikanth: I would like to provide lot more capabilities to users than > what is supported and really like for this to happen, so +1 > > > > Regards > > Srikanth Sundarrajan > > > >> Date: Thu, 15 Jan 2015 11:27:17 -0800 > >> Subject: Re: [DISCUSS] Orchestration in Falcon > >> From: [email protected] > >> To: [email protected] > >> > >> On Thu, Jan 15, 2015 at 1:25 AM, Srikanth Sundarrajan < > [email protected]> > >> wrote: > >> > >>> [email protected] > >>> > >>> It looks like we have broad consensus on this, > >> > >> Really? Thats not how I read this? I'm still not sure its worth taking > on > >> this complexity into Falcon. Did we even explore other options? I'm not > >> sure. > >> > >> > >>> should we open up a discuss thread on how we go about this ? > >> > >> May be. > >> > >> > >>> Or should we create a confluence page and collaborate through that ? > >> Too early for this. > >> > >> > >>> > >>> Regards > >>> Srikanth Sundarrajan > >>> > >>>> From: [email protected] > >>>> Date: Thu, 1 Jan 2015 22:40:48 +0530 > >>>> Subject: Re: [DISCUSS] Orchestration in Falcon > >>>> To: [email protected] > >>>> > >>>> +1. > >>>> > >>>> Few more relevant asks: > >>>> 1. Support for "Last Only" option for process scheduling (In addition > to > >>>> LIFO/FIFO), currently oozie has some issues. > >>>> 2. Support for Singleton process (lock based), the behaviour of all > >>>> instances of process is same. > >>>> > >>>> Thanks, > >>>> -Idris > >>>> > >>>> > >>>> On Thu, Jan 1, 2015 at 7:51 PM, Jean-Baptiste Onofré <[email protected] > > > >>>> wrote: > >>>> > >>>>> +1 > >>>>> > >>>>> Regards > >>>>> JB > >>>>> > >>>>> > >>>>>> On 12/31/2014 03:53 PM, Srikanth Sundarrajan wrote: > >>>>>> > >>>>>> Can we pick up this thread in the new year when folks are back from > >>>>>> break? I am in total agreement with Venkatesh here. We ought to have > >>> a long > >>>>>> term sustainable approach. Also I feel that the capabilities that we > >>> would > >>>>>> like to enable on falcon and getting them done through oozie in near > >>> term > >>>>>> seems to be a tall ask anyways. > >>>>>> > >>>>>> Regards > >>>>>> Srikanth Sundarrajan > >>>>>> > >>>>>> Date: Tue, 23 Dec 2014 16:44:06 -0800 > >>>>>>> Subject: Re: [DISCUSS] Orchestration in Falcon > >>>>>>> From: [email protected] > >>>>>>> To: [email protected] > >>>>>>> > >>>>>>> Chugging along with Oozie is bad for Falcon in the long run, for > >>> users > >>>>>>> and > >>>>>>> developers. Its horribly complex to work through the many rough > edges > >>>>>>> architecturally in Oozie. Look at all the patches for security that > >>> I had > >>>>>>> to fix around Oozie. Its unnecessarily very complex, non-uniform > and > >>> is > >>>>>>> NOT > >>>>>>> meant to be used by another tool like Falcon but was built around > end > >>>>>>> user. > >>>>>>> > >>>>>>> This is a good discussion to have - may be explore oozie for > >>> short-term > >>>>>>> but > >>>>>>> look at alternative solutions for the long-term. > >>>>>>> > >>>>>>> On Tue, Dec 23, 2014 at 7:28 AM, Srikanth Sundarrajan < > >>>>>>> [email protected]> > >>>>>>> wrote: > >>>>>>> > >>>>>>> @jb, There is no doubt merit in mapping them to oozie if possible > >>> and if > >>>>>>>> extensions are simple and straight forward enough. > >>>>>>>> > >>>>>>>> Also had a quick chat offline with Shwetha and she mentioned about > >>> some > >>>>>>>> work happening in Oozie in this regard. On further digging up, > found > >>>>>>>> https://issues.apache.org/jira/browse/OOZIE-1976. This is > possibly > >>> what > >>>>>>>> Shwetha was referring to. From the looks of it, this tries to > >>> address > >>>>>>>> item > >>>>>>>> #7 in the original thread. May be there are more jiras where > >>> additional > >>>>>>>> work such as a-periodic datasets is being worked on. Perhaps > >>> @Shwetha > >>>>>>>> can > >>>>>>>> throw some light on what is being considered and/or how these > >>>>>>>> gating/orchestration use cases can be managed. > >>>>>>>> > >>>>>>>> Regards > >>>>>>>> Srikanth Sundarrajan > >>>>>>>> > >>>>>>>> Date: Tue, 23 Dec 2014 11:06:24 +0100 > >>>>>>>>> From: [email protected] > >>>>>>>>> To: [email protected] > >>>>>>>>> Subject: Re: [DISCUSS] Orchestration in Falcon > >>>>>>>>> > >>>>>>>>> Hi all, > >>>>>>>>> > >>>>>>>>> I second Shwetha there. I think we can achieve such features in > >>> Oozie > >>>>>>>>> (with some adaptations). > >>>>>>>>> > >>>>>>>>> Regards > >>>>>>>>> JB > >>>>>>>>> > >>>>>>>>> Le 2014-12-23 10:53, Shwetha G S a écrit : > >>>>>>>>> > >>>>>>>>>> If we can get rid of oozie entirely, yes we can explore other > >>>>>>>>>> possibilities. But if we are still going to use oozie for DAG > >>>>>>>>>> execution, we > >>>>>>>>>> are going to add add another bottleneck in the whole > >>>>>>>>>> execution(currently, > >>>>>>>>>> falcon is not in the workflow execution path) and I don't think > >>> its > >>>>>>>>>> worth > >>>>>>>>>> it. > >>>>>>>>>> > >>>>>>>>>> The features that are outlined above are all available in basic > >>> forms > >>>>>>>>>> in > >>>>>>>>>> oozie and it should be easy to enhance them/make them as > extension > >>>>>>>>>> points. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> -Shwetha > >>>>>>>>>> > >>>>>>>>>> On Tue, Dec 23, 2014 at 8:12 AM, Srikanth Sundarrajan > >>>>>>>>>> <[email protected]> > >>>>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>> Here are few more gaps that we ought to solve for while we are > >>> on the > >>>>>>>>>>> subject: > >>>>>>>>>>> > >>>>>>>>>>> 1. Ability to attach to start & finish events of workflow > >>> execution. > >>>>>>>>>>> Currently we have post processing hook to listen to finish > >>> events, > >>>>>>>>>>> but > >>>>>>>>>>> we > >>>>>>>>>>> do run into scenarios where there are occasional failures with > >>>>>>>>>>> post-processing and there is potential phase lag in learning > >>> about > >>>>>>>>>>> the > >>>>>>>>>>> events. > >>>>>>>>>>> 2. Strict enforcement of concurrency control possibly spanning > >>>>>>>>>>> process > >>>>>>>>>>> boundaries. > >>>>>>>>>>> 3. Ability to tune how backlogs have to be caught up (old > >>> instances > >>>>>>>>>>> to > >>>>>>>>>>> be > >>>>>>>>>>> given higher priority, newer instances to be given higher > >>> priority, > >>>>>>>>>>> or > >>>>>>>>>>> some > >>>>>>>>>>> sort of weights to allow both to make progress at varying > rates). > >>>>>>>>>>> There > >>>>>>>>>>> have been asks for routing current vs older instances to > >>> different > >>>>>>>>>>> queues > >>>>>>>>>>> by users as an alternative. > >>>>>>>>>>> 4. Ability to have a notion of non-time based feed instances > and > >>>>>>>>>>> related > >>>>>>>>>>> coordination. > >>>>>>>>>>> 5. Currently keeping track of and managing SLAs is also a > >>> challenge, > >>>>>>>>>>> but > >>>>>>>>>>> with #1 addressed, this might be a lesser concern. > >>>>>>>>>>> > >>>>>>>>>>> Regards > >>>>>>>>>>> Srikanth Sundarrajan > >>>>>>>>>>> > >>>>>>>>>>> Subject: Re: [DISCUSS] Orchestration in Falcon > >>>>>>>>>>>> From: [email protected] > >>>>>>>>>>>> Date: Tue, 23 Dec 2014 06:30:30 +0530 > >>>>>>>>>>>> To: [email protected] > >>>>>>>>>>>> > >>>>>>>>>>>> @venkatesh, the question really is how do we enable these > >>> gating pre > >>>>>>>>>>> conditions. Seems hard enough to add them to oozie, but am not > >>>>>>>>>>> intimately > >>>>>>>>>>> familiar with oozie to comment on how hard or easy it is. Like > I > >>>>>>>>>>> responded > >>>>>>>>>>> to @ajay on the same thread, if we are to do away with > >>> coordination > >>>>>>>>>>> through > >>>>>>>>>>> oozie, we can follow up this discussion with approaches and > >>> design. > >>>>>>>>>>> Though > >>>>>>>>>>> I had quartz in my mind, wanted to leave that out of discussion > >>> to > >>>>>>>>>>> see > >>>>>>>>>>> if > >>>>>>>>>>> there is consensus for moving away from oozie coords and > >>> implementing > >>>>>>>>>>> them > >>>>>>>>>>> through other means. > >>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> Sent from my iPhone > >>>>>>>>>>>> > >>>>>>>>>>>> On 23-Dec-2014, at 1:16 am, "Seetharam Venkatesh" < > >>>>>>>>>>>> [email protected]> wrote: > >>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>>> What is the purpose of this decoupling? Why build this into > >>>>>>>>>>>> Falcon? > >>>>>>>> > >>>>>>>>> Scheduling is so common that there are dime a dozen schedulers > >>>>>>>>>>>> today > >>>>>>>> > >>>>>>>>> and > >>>>>>>>>>> > >>>>>>>>>>>> they are all extensible with custom triggers. Making it part > of > >>>>>>>>>>>> Falcon > >>>>>>>> > >>>>>>>>> will > >>>>>>>>>>> > >>>>>>>>>>>> suffer the same issues that Oozie has today. > >>>>>>>>>>>>> > >>>>>>>>>>>>> I'm sorry but I'm a HUGE -1 to this being built into Falcon > >>>>>>>>>>>> codebase. > >>>>>>>> > >>>>>>>>> > >>>>>>>>>>>>> However, I'm +1 to reusing Quartz scheduler that already > >>> exists - > >>>>>>>>>>>> stand it > >>>>>>>>>>> > >>>>>>>>>>>> up outside or embed it like we do for active MQ. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Phase 2 - I'd like to see we write a simple DAG execution > >>> layer in > >>>>>>>>>>>> YARN as > >>>>>>>>>>> > >>>>>>>>>>>> an app master with out DB and keeps state on HDFS as an > >>> alternate > >>>>>>>>>>>> to > >>>>>>>> > >>>>>>>>> Oozie. > >>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>>> Then we will have a nimble falcon which can kick ass. > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Sun, Dec 21, 2014 at 6:13 AM, Srikanth Sundarrajan < > >>>>>>>>>>>> [email protected]> > >>>>>>>>>>> > >>>>>>>>>>>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>> Hello Team, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Since its inception Falcon has used Oozie for process > >>>>>>>>>>>>> orchestration as > >>>>>>>> > >>>>>>>>> well as feed life cycle phase executions, while this has worked > >>>>>>>>>>>>> reasonably > >>>>>>>>>>> > >>>>>>>>>>>> and allowed to make higher level capabilities available > through > >>>>>>>>>>>>> Falcon, we > >>>>>>>>>>> > >>>>>>>>>>>> are increasing seeing scenarios where this is proving to be a > >>>>>>>>>>>>> limiting > >>>>>>>> > >>>>>>>>> factor. In its current form, Falcon relies on Oozie for both > >>>>>>>>>>>>> scheduling and > >>>>>>>>>>> > >>>>>>>>>>>> for workflow execution, due to which the scheduling is limited > >>>>>>>>>>>>> to time > >>>>>>>> > >>>>>>>>> based/cron based scheduling with additional gating conditions on > >>>>>>>>>>>>> data > >>>>>>>> > >>>>>>>>> availability. Also this imposes restrictions on datesets being > >>>>>>>>>>>>>> periodic/cyclic in nature. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> From an orchestration stand point, it would help if we can > >>>>>>>>>>>>> support > >>>>>>>> > >>>>>>>>> standard gating / scheduling primitives via Falcon: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> 1. Simple periodic scheduling with no gating conditions > >>>>>>>>>>>>>> 2. Cron based scheduling (day of week, day of the month, > >>> specific > >>>>>>>>>>>>> hours > >>>>>>>>>>> > >>>>>>>>>>>> and non-periodic) with no gating conditions > >>>>>>>>>>>>>> 3. Availability of new data (assuming monotonically > increasing > >>>>>>>>>>>>> data > >>>>>>>> > >>>>>>>>> version, availavility of new versions) > >>>>>>>>>>>>>> 4. Changes to existing data (reinstatement - similar to late > >>> data > >>>>>>>>>>>>> handling) > >>>>>>>>>>> > >>>>>>>>>>>> 5. External trigger/notifications > >>>>>>>>>>>>>> 6. Availability of specific instances of data as declared as > >>>>>>>>>>>>> mandatory > >>>>>>>> > >>>>>>>>> dependency > >>>>>>>>>>>>>> 7. Availability of a minimum subset of instances of data > >>>>>>>>>>>>> declared as > >>>>>>>> > >>>>>>>>> mandatory depedency (at least 10 hourly instances of a day with > >>>>>>>>>>>>> 24 > >>>>>>>> > >>>>>>>>> instances for ex) > >>>>>>>>>>>>>> 8. Valid combinations of the above. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> In this context, I would like to propose that we move away > >>> from > >>>>>>>>>>>>> Oozie > >>>>>>>> > >>>>>>>>> for > >>>>>>>>>>> > >>>>>>>>>>>> the orchestration requirements and have them implemented > >>> natively > >>>>>>>>>>>>> within > >>>>>>>>>>> > >>>>>>>>>>>> Falcon. It will no doubt make Falcon server bulkier and > heavier > >>>>>>>>>>>>> in > >>>>>>>> > >>>>>>>>> both > >>>>>>>>>>> > >>>>>>>>>>>> code and deployment, but seems like without it, the > >>> orchestration > >>>>>>>>>>>>> within > >>>>>>>>>>> > >>>>>>>>>>>> Falcon will be limited by capabilities available within Oozie. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Please do note that this suggestion is restricted to the > >>>>>>>>>>>>> scheduling > >>>>>>>> > >>>>>>>>> and > >>>>>>>>>>> > >>>>>>>>>>>> not to the workflow execution. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Would like to hear from fellow developers and users on what > >>> your > >>>>>>>>>>>>> thoughts > >>>>>>>>>>> > >>>>>>>>>>>> are. Please do chime in with your views. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Regards > >>>>>>>>>>>>>> Srikanth Sundarrajan > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> -- > >>>>>>>>>>>>> Regards, > >>>>>>>>>>>>> Venkatesh > >>>>>>>>>>>>> > >>>>>>>>>>>>> “Perfection (in design) is achieved not when there is nothing > >>>>>>>>>>>> more to > >>>>>>>> > >>>>>>>>> add, > >>>>>>>>>>> > >>>>>>>>>>>> but rather when there is nothing more to take away.” > >>>>>>>>>>>>> - Antoine de Saint-Exupéry > >>>>>>> > >>>>>>> > >>>>>>> -- > >>>>>>> Regards, > >>>>>>> Venkatesh > >>>>>>> > >>>>>>> “Perfection (in design) is achieved not when there is nothing more > to > >>>>>>> add, > >>>>>>> but rather when there is nothing more to take away.” > >>>>>>> - Antoine de Saint-Exupéry > >>>>> -- > >>>>> Jean-Baptiste Onofré > >>>>> [email protected] > >>>>> http://blog.nanthrax.net > >>>>> Talend - http://www.talend.com > >> > >> > >> > >> -- > >> Regards, > >> Venkatesh > >> > >> “Perfection (in design) is achieved not when there is nothing more to > add, > >> but rather when there is nothing more to take away.” > >> - Antoine de Saint-Exupéry > > > > > -- Regards, Venkatesh “Perfection (in design) is achieved not when there is nothing more to add, but rather when there is nothing more to take away.” - Antoine de Saint-Exupéry
