+1 for confluence page. It will serve as a design documentation as well along with discussion.
On Thu, Jan 15, 2015 at 2:55 PM, Srikanth Sundarrajan <[email protected]> wrote: > [email protected] > > It looks like we have broad consensus on this, should we open up a discuss > thread on how we go about this ? Or should we create a confluence page and > collaborate through that ? > > Regards > Srikanth Sundarrajan > > > From: [email protected] > > Date: Thu, 1 Jan 2015 22:40:48 +0530 > > Subject: Re: [DISCUSS] Orchestration in Falcon > > To: [email protected] > > > > +1. > > > > Few more relevant asks: > > 1. Support for "Last Only" option for process scheduling (In addition to > > LIFO/FIFO), currently oozie has some issues. > > 2. Support for Singleton process (lock based), the behaviour of all > > instances of process is same. > > > > Thanks, > > -Idris > > > > > > On Thu, Jan 1, 2015 at 7:51 PM, Jean-Baptiste Onofré <[email protected]> > > wrote: > > > > > +1 > > > > > > Regards > > > JB > > > > > > > > > On 12/31/2014 03:53 PM, Srikanth Sundarrajan wrote: > > > > > >> Can we pick up this thread in the new year when folks are back from > > >> break? I am in total agreement with Venkatesh here. We ought to have > a long > > >> term sustainable approach. Also I feel that the capabilities that we > would > > >> like to enable on falcon and getting them done through oozie in near > term > > >> seems to be a tall ask anyways. > > >> > > >> Regards > > >> Srikanth Sundarrajan > > >> > > >> Date: Tue, 23 Dec 2014 16:44:06 -0800 > > >>> Subject: Re: [DISCUSS] Orchestration in Falcon > > >>> From: [email protected] > > >>> To: [email protected] > > >>> > > >>> Chugging along with Oozie is bad for Falcon in the long run, for > users > > >>> and > > >>> developers. Its horribly complex to work through the many rough edges > > >>> architecturally in Oozie. Look at all the patches for security that > I had > > >>> to fix around Oozie. Its unnecessarily very complex, non-uniform and > is > > >>> NOT > > >>> meant to be used by another tool like Falcon but was built around end > > >>> user. > > >>> > > >>> This is a good discussion to have - may be explore oozie for > short-term > > >>> but > > >>> look at alternative solutions for the long-term. > > >>> > > >>> On Tue, Dec 23, 2014 at 7:28 AM, Srikanth Sundarrajan < > > >>> [email protected]> > > >>> wrote: > > >>> > > >>> @jb, There is no doubt merit in mapping them to oozie if possible > and if > > >>>> extensions are simple and straight forward enough. > > >>>> > > >>>> Also had a quick chat offline with Shwetha and she mentioned about > some > > >>>> work happening in Oozie in this regard. On further digging up, found > > >>>> https://issues.apache.org/jira/browse/OOZIE-1976. This is possibly > what > > >>>> Shwetha was referring to. From the looks of it, this tries to > address > > >>>> item > > >>>> #7 in the original thread. May be there are more jiras where > additional > > >>>> work such as a-periodic datasets is being worked on. Perhaps > @Shwetha > > >>>> can > > >>>> throw some light on what is being considered and/or how these > > >>>> gating/orchestration use cases can be managed. > > >>>> > > >>>> Regards > > >>>> Srikanth Sundarrajan > > >>>> > > >>>> Date: Tue, 23 Dec 2014 11:06:24 +0100 > > >>>>> From: [email protected] > > >>>>> To: [email protected] > > >>>>> Subject: Re: [DISCUSS] Orchestration in Falcon > > >>>>> > > >>>>> Hi all, > > >>>>> > > >>>>> I second Shwetha there. I think we can achieve such features in > Oozie > > >>>>> (with some adaptations). > > >>>>> > > >>>>> Regards > > >>>>> JB > > >>>>> > > >>>>> Le 2014-12-23 10:53, Shwetha G S a écrit : > > >>>>> > > >>>>>> If we can get rid of oozie entirely, yes we can explore other > > >>>>>> possibilities. But if we are still going to use oozie for DAG > > >>>>>> execution, we > > >>>>>> are going to add add another bottleneck in the whole > > >>>>>> execution(currently, > > >>>>>> falcon is not in the workflow execution path) and I don't think > its > > >>>>>> worth > > >>>>>> it. > > >>>>>> > > >>>>>> The features that are outlined above are all available in basic > forms > > >>>>>> in > > >>>>>> oozie and it should be easy to enhance them/make them as extension > > >>>>>> points. > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> -Shwetha > > >>>>>> > > >>>>>> On Tue, Dec 23, 2014 at 8:12 AM, Srikanth Sundarrajan > > >>>>>> <[email protected]> > > >>>>>> wrote: > > >>>>>> > > >>>>>> Here are few more gaps that we ought to solve for while we are > on the > > >>>>>>> subject: > > >>>>>>> > > >>>>>>> 1. Ability to attach to start & finish events of workflow > execution. > > >>>>>>> Currently we have post processing hook to listen to finish > events, > > >>>>>>> but > > >>>>>>> we > > >>>>>>> do run into scenarios where there are occasional failures with > > >>>>>>> post-processing and there is potential phase lag in learning > about > > >>>>>>> the > > >>>>>>> events. > > >>>>>>> 2. Strict enforcement of concurrency control possibly spanning > > >>>>>>> process > > >>>>>>> boundaries. > > >>>>>>> 3. Ability to tune how backlogs have to be caught up (old > instances > > >>>>>>> to > > >>>>>>> be > > >>>>>>> given higher priority, newer instances to be given higher > priority, > > >>>>>>> or > > >>>>>>> some > > >>>>>>> sort of weights to allow both to make progress at varying rates). > > >>>>>>> There > > >>>>>>> have been asks for routing current vs older instances to > different > > >>>>>>> queues > > >>>>>>> by users as an alternative. > > >>>>>>> 4. Ability to have a notion of non-time based feed instances and > > >>>>>>> related > > >>>>>>> coordination. > > >>>>>>> 5. Currently keeping track of and managing SLAs is also a > challenge, > > >>>>>>> but > > >>>>>>> with #1 addressed, this might be a lesser concern. > > >>>>>>> > > >>>>>>> Regards > > >>>>>>> Srikanth Sundarrajan > > >>>>>>> > > >>>>>>> Subject: Re: [DISCUSS] Orchestration in Falcon > > >>>>>>>> From: [email protected] > > >>>>>>>> Date: Tue, 23 Dec 2014 06:30:30 +0530 > > >>>>>>>> To: [email protected] > > >>>>>>>> > > >>>>>>>> @venkatesh, the question really is how do we enable these > gating pre > > >>>>>>>> > > >>>>>>> conditions. Seems hard enough to add them to oozie, but am not > > >>>>>>> intimately > > >>>>>>> familiar with oozie to comment on how hard or easy it is. Like I > > >>>>>>> responded > > >>>>>>> to @ajay on the same thread, if we are to do away with > coordination > > >>>>>>> through > > >>>>>>> oozie, we can follow up this discussion with approaches and > design. > > >>>>>>> Though > > >>>>>>> I had quartz in my mind, wanted to leave that out of discussion > to > > >>>>>>> see > > >>>>>>> if > > >>>>>>> there is consensus for moving away from oozie coords and > implementing > > >>>>>>> them > > >>>>>>> through other means. > > >>>>>>> > > >>>>>>>> > > >>>>>>>> Sent from my iPhone > > >>>>>>>> > > >>>>>>>> On 23-Dec-2014, at 1:16 am, "Seetharam Venkatesh" < > > >>>>>>>>> > > >>>>>>>> [email protected]> wrote: > > >>>>>>> > > >>>>>>>> > > >>>>>>>>> What is the purpose of this decoupling? Why build this into > > >>>>>>>>> > > >>>>>>>> Falcon? > > >>>> > > >>>>> Scheduling is so common that there are dime a dozen schedulers > > >>>>>>>>> > > >>>>>>>> today > > >>>> > > >>>>> and > > >>>>>>> > > >>>>>>>> they are all extensible with custom triggers. Making it part of > > >>>>>>>>> > > >>>>>>>> Falcon > > >>>> > > >>>>> will > > >>>>>>> > > >>>>>>>> suffer the same issues that Oozie has today. > > >>>>>>>>> > > >>>>>>>>> I'm sorry but I'm a HUGE -1 to this being built into Falcon > > >>>>>>>>> > > >>>>>>>> codebase. > > >>>> > > >>>>> > > >>>>>>>>> However, I'm +1 to reusing Quartz scheduler that already > exists - > > >>>>>>>>> > > >>>>>>>> stand it > > >>>>>>> > > >>>>>>>> up outside or embed it like we do for active MQ. > > >>>>>>>>> > > >>>>>>>>> Phase 2 - I'd like to see we write a simple DAG execution > layer in > > >>>>>>>>> > > >>>>>>>> YARN as > > >>>>>>> > > >>>>>>>> an app master with out DB and keeps state on HDFS as an > alternate > > >>>>>>>>> > > >>>>>>>> to > > >>>> > > >>>>> Oozie. > > >>>>>>> > > >>>>>>>> > > >>>>>>>>> Then we will have a nimble falcon which can kick ass. > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> On Sun, Dec 21, 2014 at 6:13 AM, Srikanth Sundarrajan < > > >>>>>>>>> > > >>>>>>>> [email protected]> > > >>>>>>> > > >>>>>>>> wrote: > > >>>>>>>>> > > >>>>>>>>> Hello Team, > > >>>>>>>>>> > > >>>>>>>>>> Since its inception Falcon has used Oozie for process > > >>>>>>>>>> > > >>>>>>>>> orchestration as > > >>>> > > >>>>> well as feed life cycle phase executions, while this has worked > > >>>>>>>>>> > > >>>>>>>>> reasonably > > >>>>>>> > > >>>>>>>> and allowed to make higher level capabilities available through > > >>>>>>>>>> > > >>>>>>>>> Falcon, we > > >>>>>>> > > >>>>>>>> are increasing seeing scenarios where this is proving to be a > > >>>>>>>>>> > > >>>>>>>>> limiting > > >>>> > > >>>>> factor. In its current form, Falcon relies on Oozie for both > > >>>>>>>>>> > > >>>>>>>>> scheduling and > > >>>>>>> > > >>>>>>>> for workflow execution, due to which the scheduling is limited > > >>>>>>>>>> > > >>>>>>>>> to time > > >>>> > > >>>>> based/cron based scheduling with additional gating conditions on > > >>>>>>>>>> > > >>>>>>>>> data > > >>>> > > >>>>> availability. Also this imposes restrictions on datesets being > > >>>>>>>>>> periodic/cyclic in nature. > > >>>>>>>>>> > > >>>>>>>>>> From an orchestration stand point, it would help if we can > > >>>>>>>>>> > > >>>>>>>>> support > > >>>> > > >>>>> standard gating / scheduling primitives via Falcon: > > >>>>>>>>>> > > >>>>>>>>>> 1. Simple periodic scheduling with no gating conditions > > >>>>>>>>>> 2. Cron based scheduling (day of week, day of the month, > specific > > >>>>>>>>>> > > >>>>>>>>> hours > > >>>>>>> > > >>>>>>>> and non-periodic) with no gating conditions > > >>>>>>>>>> 3. Availability of new data (assuming monotonically increasing > > >>>>>>>>>> > > >>>>>>>>> data > > >>>> > > >>>>> version, availavility of new versions) > > >>>>>>>>>> 4. Changes to existing data (reinstatement - similar to late > data > > >>>>>>>>>> > > >>>>>>>>> handling) > > >>>>>>> > > >>>>>>>> 5. External trigger/notifications > > >>>>>>>>>> 6. Availability of specific instances of data as declared as > > >>>>>>>>>> > > >>>>>>>>> mandatory > > >>>> > > >>>>> dependency > > >>>>>>>>>> 7. Availability of a minimum subset of instances of data > > >>>>>>>>>> > > >>>>>>>>> declared as > > >>>> > > >>>>> mandatory depedency (at least 10 hourly instances of a day with > > >>>>>>>>>> > > >>>>>>>>> 24 > > >>>> > > >>>>> instances for ex) > > >>>>>>>>>> 8. Valid combinations of the above. > > >>>>>>>>>> > > >>>>>>>>>> In this context, I would like to propose that we move away > from > > >>>>>>>>>> > > >>>>>>>>> Oozie > > >>>> > > >>>>> for > > >>>>>>> > > >>>>>>>> the orchestration requirements and have them implemented > natively > > >>>>>>>>>> > > >>>>>>>>> within > > >>>>>>> > > >>>>>>>> Falcon. It will no doubt make Falcon server bulkier and heavier > > >>>>>>>>>> > > >>>>>>>>> in > > >>>> > > >>>>> both > > >>>>>>> > > >>>>>>>> code and deployment, but seems like without it, the > orchestration > > >>>>>>>>>> > > >>>>>>>>> within > > >>>>>>> > > >>>>>>>> Falcon will be limited by capabilities available within Oozie. > > >>>>>>>>>> > > >>>>>>>>>> Please do note that this suggestion is restricted to the > > >>>>>>>>>> > > >>>>>>>>> scheduling > > >>>> > > >>>>> and > > >>>>>>> > > >>>>>>>> not to the workflow execution. > > >>>>>>>>>> > > >>>>>>>>>> Would like to hear from fellow developers and users on what > your > > >>>>>>>>>> > > >>>>>>>>> thoughts > > >>>>>>> > > >>>>>>>> are. Please do chime in with your views. > > >>>>>>>>>> > > >>>>>>>>>> Regards > > >>>>>>>>>> Srikanth Sundarrajan > > >>>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> -- > > >>>>>>>>> Regards, > > >>>>>>>>> Venkatesh > > >>>>>>>>> > > >>>>>>>>> “Perfection (in design) is achieved not when there is nothing > > >>>>>>>>> > > >>>>>>>> more to > > >>>> > > >>>>> add, > > >>>>>>> > > >>>>>>>> but rather when there is nothing more to take away.” > > >>>>>>>>> - Antoine de Saint-Exupéry > > >>>>>>>>> > > >>>>>>>> > > >>>>>>> > > >>>>>>> > > >>>> > > >>>> > > >>> > > >>> > > >>> -- > > >>> Regards, > > >>> Venkatesh > > >>> > > >>> “Perfection (in design) is achieved not when there is nothing more to > > >>> add, > > >>> but rather when there is nothing more to take away.” > > >>> - Antoine de Saint-Exupéry > > >>> > > >> > > >> > > >> > > > -- > > > Jean-Baptiste Onofré > > > [email protected] > > > http://blog.nanthrax.net > > > Talend - http://www.talend.com > > > > >
