Would it be possible to build an extensibility layer around the job engine I.e. By default we refactor and build a simple job engine and not take External dependencies, but allow for external job engines to be plugged in to enable Scheduling and complex workflows…
On 1/14/15, 9:40 AM, "Julian Hyde" <[email protected]> wrote: >Still worth considering an existing tool. The simplest code is the code >you don’t maintain. :) > >On Jan 14, 2015, at 2:57 AM, Li Yang <[email protected]> wrote: > >> Sorry I'm late, just a recap. >> >> The "Job Engine" here only manages long running tasks lifecycle and >> dependencies, it oversees task sequences, like cube build is made up of >> several mapreduces, and allow user to start/stop/pause/resume. >> >> It does not do scheduling or fancy workflow, that's why many existing >> products like quartz or oozie overkill. We want keep Kylin overall >> architecture simple and be easy to deploy and debug. >> >> The purpose of this refactoring is to separate the manager role and the >> worker role which previous impl mixed up. Once done, replacing a worker >> shall become easy. We will be free to explore other cube building >>workers, >> like Flink and Spark mentioned. >> >> Cheers >> Yang >> >> On Wed, Jan 14, 2015 at 10:08 AM, Zhou, Qianhao <[email protected]> >>wrote: >> >>> Thanks Ted for the advice. >>> I think the right way to do is to take more options into consideration, >>> then make decision. >>> Whichever solution is used, we are going to learn something that will >>> benefit sooner or later. >>> >>> Best Regard >>> Zhou QianHao >>> >>> >>> >>> >>> >>> On 1/14/15, 12:37 AM, "Ted Dunning" <[email protected]> wrote: >>> >>>> OK. >>>> >>>> On Tue, Jan 13, 2015 at 10:30 AM, 周千昊 <[email protected]> wrote: >>>> >>>>> As I mentioned, we don't want extra dependency because that will make >>>>> the >>>>> deployment more complex. >>>>> As for Aurora, the users will have an extra step for installation. >>>>> However >>>>> so far, kylin will only need a war package and a hadoop cluster. >>>>> On Tue Jan 13 2015 at 10:26:50 PM Ted Dunning <[email protected]> >>>>> wrote: >>>>> >>>>>> I understand you want to write your own job engine. But why not use >>>>> one >>>>>> that already exists? >>>>>> >>>>>> Given that you mention quartz, it sounds like Aurora might be a good >>>>> fit. >>>>>> Why not use it? >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Jan 13, 2015 at 3:34 AM, Zhou, Qianhao <[email protected]> >>>>> wrote: >>>>>> >>>>>>> What we want is that: >>>>>>> >>>>>>> 1. A lightweight job engine, easy to start, stop and check jobs >>>>>>> Because most of the heavyweight job is map-reduce which is >>>>> already >>>>>>> running on the cluster, so we don’t need the job engine to run >>>>> on a >>>>>>> cluster. >>>>>>> >>>>>>> 2. Kylin already has a job engine based on Quartz, however, only a >>>>> very >>>>>>> small >>>>>>> part of functionalities are used, so we can easily replace it >>>>> with >>>>>>> standard java api. >>>>>>> Thus there will be no extra dependency which means easier to >>>>> deploy. >>>>>>> >>>>>>> Currently a very simple job engine implementation will meet the >>>>> kylin’s >>>>>>> needs. >>>>>>> So I think at this timing just keep it simple would be the better >>>>> choice. >>>>>>> >>>>>>> >>>>>>> Best Regard >>>>>>> Zhou QianHao >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 1/13/15, 4:43 PM, "Ted Dunning" <[email protected]> wrote: >>>>>>> >>>>>>>> So why are the following systems unsuitable? >>>>>>>> >>>>>>>> - mesos + (aurora or chronos) >>>>>>>> - spark >>>>>>>> - yarn >>>>>>>> - drill's drillbits >>>>>>>> >>>>>>>> These options do different things. I know that. I am not >>>>>>>>entirely >>>>>> clear >>>>>>>> on what you want, however, so I present these different options so >>>>> that >>>>>>>> you >>>>>>>> can tell me better what you want. >>>>>>>> >>>>>>>> Mesos provides very flexible job scheduling. With Aurora, it has >>>>>> support >>>>>>>> for handling long-running and periodic jobs. With Chronos, it has >>>>> the >>>>>>>> equivalent of a cluster level cron. >>>>>>>> >>>>>>>> Spark provides the ability for a program to spawn lots of parallel >>>>>>>> execution. This is different than what most people mean by job >>>>>>>> scheduling, >>>>>>>> but in conjunction with a queuing system combined with spark >>>>> streaming, >>>>>>>> you >>>>>>>> can get remarkably close to a job scheduler. >>>>>>>> >>>>>>>> Yarn can run jobs, but has no capabilities to schedule recurring >>>>> jobs. >>>>>> It >>>>>>>> can adjudicate the allocation of cluster resources. This is >>>>> different >>>>>>>> from >>>>>>>> what either spark or mesos does. >>>>>>>> >>>>>>>> Drill's drillbits do scheduling of queries across a parallel >>>>> execution >>>>>>>> environment. It currently has no user impersonation, but does do >>>>> an >>>>>>>> interesting job of scheduling parts of parallel queries. >>>>>>>> >>>>>>>> Each of these could be considered like a job scheduler. Only a >>>>> very >>>>> few >>>>>>>> are likely to be what you are talking about. >>>>>>>> >>>>>>>> Which is it? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>> >>> >
