+1 Regards JB
-------- Original message -------- From: Srikanth Sundarrajan <[email protected]> Date:15/01/2015 10:25 (GMT+01:00) To: [email protected] Cc: Subject: RE: [DISCUSS] Orchestration in Falcon [email protected] It looks like we have broad consensus on this, should we open up a discuss thread on how we go about this ? Or should we create a confluence page and collaborate through that ? Regards Srikanth Sundarrajan > From: [email protected] > Date: Thu, 1 Jan 2015 22:40:48 +0530 > Subject: Re: [DISCUSS] Orchestration in Falcon > To: [email protected] > > +1. > > Few more relevant asks: > 1. Support for "Last Only" option for process scheduling (In addition to > LIFO/FIFO), currently oozie has some issues. > 2. Support for Singleton process (lock based), the behaviour of all > instances of process is same. > > Thanks, > -Idris > > > On Thu, Jan 1, 2015 at 7:51 PM, Jean-Baptiste Onofré <[email protected]> > wrote: > > > +1 > > > > Regards > > JB > > > > > > On 12/31/2014 03:53 PM, Srikanth Sundarrajan wrote: > > > >> Can we pick up this thread in the new year when folks are back from > >> break? I am in total agreement with Venkatesh here. We ought to have a long > >> term sustainable approach. Also I feel that the capabilities that we would > >> like to enable on falcon and getting them done through oozie in near term > >> seems to be a tall ask anyways. > >> > >> Regards > >> Srikanth Sundarrajan > >> > >> Date: Tue, 23 Dec 2014 16:44:06 -0800 > >>> Subject: Re: [DISCUSS] Orchestration in Falcon > >>> From: [email protected] > >>> To: [email protected] > >>> > >>> Chugging along with Oozie is bad for Falcon in the long run, for users > >>> and > >>> developers. Its horribly complex to work through the many rough edges > >>> architecturally in Oozie. Look at all the patches for security that I had > >>> to fix around Oozie. Its unnecessarily very complex, non-uniform and is > >>> NOT > >>> meant to be used by another tool like Falcon but was built around end > >>> user. > >>> > >>> This is a good discussion to have - may be explore oozie for short-term > >>> but > >>> look at alternative solutions for the long-term. > >>> > >>> On Tue, Dec 23, 2014 at 7:28 AM, Srikanth Sundarrajan < > >>> [email protected]> > >>> wrote: > >>> > >>> @jb, There is no doubt merit in mapping them to oozie if possible and if > >>>> extensions are simple and straight forward enough. > >>>> > >>>> Also had a quick chat offline with Shwetha and she mentioned about some > >>>> work happening in Oozie in this regard. On further digging up, found > >>>> https://issues.apache.org/jira/browse/OOZIE-1976. This is possibly what > >>>> Shwetha was referring to. From the looks of it, this tries to address > >>>> item > >>>> #7 in the original thread. May be there are more jiras where additional > >>>> work such as a-periodic datasets is being worked on. Perhaps @Shwetha > >>>> can > >>>> throw some light on what is being considered and/or how these > >>>> gating/orchestration use cases can be managed. > >>>> > >>>> Regards > >>>> Srikanth Sundarrajan > >>>> > >>>> Date: Tue, 23 Dec 2014 11:06:24 +0100 > >>>>> From: [email protected] > >>>>> To: [email protected] > >>>>> Subject: Re: [DISCUSS] Orchestration in Falcon > >>>>> > >>>>> Hi all, > >>>>> > >>>>> I second Shwetha there. I think we can achieve such features in Oozie > >>>>> (with some adaptations). > >>>>> > >>>>> Regards > >>>>> JB > >>>>> > >>>>> Le 2014-12-23 10:53, Shwetha G S a écrit : > >>>>> > >>>>>> If we can get rid of oozie entirely, yes we can explore other > >>>>>> possibilities. But if we are still going to use oozie for DAG > >>>>>> execution, we > >>>>>> are going to add add another bottleneck in the whole > >>>>>> execution(currently, > >>>>>> falcon is not in the workflow execution path) and I don't think its > >>>>>> worth > >>>>>> it. > >>>>>> > >>>>>> The features that are outlined above are all available in basic forms > >>>>>> in > >>>>>> oozie and it should be easy to enhance them/make them as extension > >>>>>> points. > >>>>>> > >>>>>> > >>>>>> > >>>>>> -Shwetha > >>>>>> > >>>>>> On Tue, Dec 23, 2014 at 8:12 AM, Srikanth Sundarrajan > >>>>>> <[email protected]> > >>>>>> wrote: > >>>>>> > >>>>>> Here are few more gaps that we ought to solve for while we are on the > >>>>>>> subject: > >>>>>>> > >>>>>>> 1. Ability to attach to start & finish events of workflow execution. > >>>>>>> Currently we have post processing hook to listen to finish events, > >>>>>>> but > >>>>>>> we > >>>>>>> do run into scenarios where there are occasional failures with > >>>>>>> post-processing and there is potential phase lag in learning about > >>>>>>> the > >>>>>>> events. > >>>>>>> 2. Strict enforcement of concurrency control possibly spanning > >>>>>>> process > >>>>>>> boundaries. > >>>>>>> 3. Ability to tune how backlogs have to be caught up (old instances > >>>>>>> to > >>>>>>> be > >>>>>>> given higher priority, newer instances to be given higher priority, > >>>>>>> or > >>>>>>> some > >>>>>>> sort of weights to allow both to make progress at varying rates). > >>>>>>> There > >>>>>>> have been asks for routing current vs older instances to different > >>>>>>> queues > >>>>>>> by users as an alternative. > >>>>>>> 4. Ability to have a notion of non-time based feed instances and > >>>>>>> related > >>>>>>> coordination. > >>>>>>> 5. Currently keeping track of and managing SLAs is also a challenge, > >>>>>>> but > >>>>>>> with #1 addressed, this might be a lesser concern. > >>>>>>> > >>>>>>> Regards > >>>>>>> Srikanth Sundarrajan > >>>>>>> > >>>>>>> Subject: Re: [DISCUSS] Orchestration in Falcon > >>>>>>>> From: [email protected] > >>>>>>>> Date: Tue, 23 Dec 2014 06:30:30 +0530 > >>>>>>>> To: [email protected] > >>>>>>>> > >>>>>>>> @venkatesh, the question really is how do we enable these gating pre > >>>>>>>> > >>>>>>> conditions. Seems hard enough to add them to oozie, but am not > >>>>>>> intimately > >>>>>>> familiar with oozie to comment on how hard or easy it is. Like I > >>>>>>> responded > >>>>>>> to @ajay on the same thread, if we are to do away with coordination > >>>>>>> through > >>>>>>> oozie, we can follow up this discussion with approaches and design. > >>>>>>> Though > >>>>>>> I had quartz in my mind, wanted to leave that out of discussion to > >>>>>>> see > >>>>>>> if > >>>>>>> there is consensus for moving away from oozie coords and implementing > >>>>>>> them > >>>>>>> through other means. > >>>>>>> > >>>>>>>> > >>>>>>>> Sent from my iPhone > >>>>>>>> > >>>>>>>> On 23-Dec-2014, at 1:16 am, "Seetharam Venkatesh" < > >>>>>>>>> > >>>>>>>> [email protected]> wrote: > >>>>>>> > >>>>>>>> > >>>>>>>>> What is the purpose of this decoupling? Why build this into > >>>>>>>>> > >>>>>>>> Falcon? > >>>> > >>>>> Scheduling is so common that there are dime a dozen schedulers > >>>>>>>>> > >>>>>>>> today > >>>> > >>>>> and > >>>>>>> > >>>>>>>> they are all extensible with custom triggers. Making it part of > >>>>>>>>> > >>>>>>>> Falcon > >>>> > >>>>> will > >>>>>>> > >>>>>>>> suffer the same issues that Oozie has today. > >>>>>>>>> > >>>>>>>>> I'm sorry but I'm a HUGE -1 to this being built into Falcon > >>>>>>>>> > >>>>>>>> codebase. > >>>> > >>>>> > >>>>>>>>> However, I'm +1 to reusing Quartz scheduler that already exists - > >>>>>>>>> > >>>>>>>> stand it > >>>>>>> > >>>>>>>> up outside or embed it like we do for active MQ. > >>>>>>>>> > >>>>>>>>> Phase 2 - I'd like to see we write a simple DAG execution layer in > >>>>>>>>> > >>>>>>>> YARN as > >>>>>>> > >>>>>>>> an app master with out DB and keeps state on HDFS as an alternate > >>>>>>>>> > >>>>>>>> to > >>>> > >>>>> Oozie. > >>>>>>> > >>>>>>>> > >>>>>>>>> Then we will have a nimble falcon which can kick ass. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On Sun, Dec 21, 2014 at 6:13 AM, Srikanth Sundarrajan < > >>>>>>>>> > >>>>>>>> [email protected]> > >>>>>>> > >>>>>>>> wrote: > >>>>>>>>> > >>>>>>>>> Hello Team, > >>>>>>>>>> > >>>>>>>>>> Since its inception Falcon has used Oozie for process > >>>>>>>>>> > >>>>>>>>> orchestration as > >>>> > >>>>> well as feed life cycle phase executions, while this has worked > >>>>>>>>>> > >>>>>>>>> reasonably > >>>>>>> > >>>>>>>> and allowed to make higher level capabilities available through > >>>>>>>>>> > >>>>>>>>> Falcon, we > >>>>>>> > >>>>>>>> are increasing seeing scenarios where this is proving to be a > >>>>>>>>>> > >>>>>>>>> limiting > >>>> > >>>>> factor. In its current form, Falcon relies on Oozie for both > >>>>>>>>>> > >>>>>>>>> scheduling and > >>>>>>> > >>>>>>>> for workflow execution, due to which the scheduling is limited > >>>>>>>>>> > >>>>>>>>> to time > >>>> > >>>>> based/cron based scheduling with additional gating conditions on > >>>>>>>>>> > >>>>>>>>> data > >>>> > >>>>> availability. Also this imposes restrictions on datesets being > >>>>>>>>>> periodic/cyclic in nature. > >>>>>>>>>> > >>>>>>>>>> From an orchestration stand point, it would help if we can > >>>>>>>>>> > >>>>>>>>> support > >>>> > >>>>> standard gating / scheduling primitives via Falcon: > >>>>>>>>>> > >>>>>>>>>> 1. Simple periodic scheduling with no gating conditions > >>>>>>>>>> 2. Cron based scheduling (day of week, day of the month, specific > >>>>>>>>>> > >>>>>>>>> hours > >>>>>>> > >>>>>>>> and non-periodic) with no gating conditions > >>>>>>>>>> 3. Availability of new data (assuming monotonically increasing > >>>>>>>>>> > >>>>>>>>> data > >>>> > >>>>> version, availavility of new versions) > >>>>>>>>>> 4. Changes to existing data (reinstatement - similar to late data > >>>>>>>>>> > >>>>>>>>> handling) > >>>>>>> > >>>>>>>> 5. External trigger/notifications > >>>>>>>>>> 6. Availability of specific instances of data as declared as > >>>>>>>>>> > >>>>>>>>> mandatory > >>>> > >>>>> dependency > >>>>>>>>>> 7. Availability of a minimum subset of instances of data > >>>>>>>>>> > >>>>>>>>> declared as > >>>> > >>>>> mandatory depedency (at least 10 hourly instances of a day with > >>>>>>>>>> > >>>>>>>>> 24 > >>>> > >>>>> instances for ex) > >>>>>>>>>> 8. Valid combinations of the above. > >>>>>>>>>> > >>>>>>>>>> In this context, I would like to propose that we move away from > >>>>>>>>>> > >>>>>>>>> Oozie > >>>> > >>>>> for > >>>>>>> > >>>>>>>> the orchestration requirements and have them implemented natively > >>>>>>>>>> > >>>>>>>>> within > >>>>>>> > >>>>>>>> Falcon. It will no doubt make Falcon server bulkier and heavier > >>>>>>>>>> > >>>>>>>>> in > >>>> > >>>>> both > >>>>>>> > >>>>>>>> code and deployment, but seems like without it, the orchestration > >>>>>>>>>> > >>>>>>>>> within > >>>>>>> > >>>>>>>> Falcon will be limited by capabilities available within Oozie. > >>>>>>>>>> > >>>>>>>>>> Please do note that this suggestion is restricted to the > >>>>>>>>>> > >>>>>>>>> scheduling > >>>> > >>>>> and > >>>>>>> > >>>>>>>> not to the workflow execution. > >>>>>>>>>> > >>>>>>>>>> Would like to hear from fellow developers and users on what your > >>>>>>>>>> > >>>>>>>>> thoughts > >>>>>>> > >>>>>>>> are. Please do chime in with your views. > >>>>>>>>>> > >>>>>>>>>> Regards > >>>>>>>>>> Srikanth Sundarrajan > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> -- > >>>>>>>>> Regards, > >>>>>>>>> Venkatesh > >>>>>>>>> > >>>>>>>>> “Perfection (in design) is achieved not when there is nothing > >>>>>>>>> > >>>>>>>> more to > >>>> > >>>>> add, > >>>>>>> > >>>>>>>> but rather when there is nothing more to take away.” > >>>>>>>>> - Antoine de Saint-Exupéry > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>>> > >>>> > >>>> > >>> > >>> > >>> -- > >>> Regards, > >>> Venkatesh > >>> > >>> “Perfection (in design) is achieved not when there is nothing more to > >>> add, > >>> but rather when there is nothing more to take away.” > >>> - Antoine de Saint-Exupéry > >>> > >> > >> > >> > > -- > > Jean-Baptiste Onofré > > [email protected] > > http://blog.nanthrax.net > > Talend - http://www.talend.com > >
