Still worth considering an existing tool. The simplest code is the code you don’t maintain. :)
On Jan 14, 2015, at 2:57 AM, Li Yang <[email protected]> wrote: > Sorry I'm late, just a recap. > > The "Job Engine" here only manages long running tasks lifecycle and > dependencies, it oversees task sequences, like cube build is made up of > several mapreduces, and allow user to start/stop/pause/resume. > > It does not do scheduling or fancy workflow, that's why many existing > products like quartz or oozie overkill. We want keep Kylin overall > architecture simple and be easy to deploy and debug. > > The purpose of this refactoring is to separate the manager role and the > worker role which previous impl mixed up. Once done, replacing a worker > shall become easy. We will be free to explore other cube building workers, > like Flink and Spark mentioned. > > Cheers > Yang > > On Wed, Jan 14, 2015 at 10:08 AM, Zhou, Qianhao <[email protected]> wrote: > >> Thanks Ted for the advice. >> I think the right way to do is to take more options into consideration, >> then make decision. >> Whichever solution is used, we are going to learn something that will >> benefit sooner or later. >> >> Best Regard >> Zhou QianHao >> >> >> >> >> >> On 1/14/15, 12:37 AM, "Ted Dunning" <[email protected]> wrote: >> >>> OK. >>> >>> On Tue, Jan 13, 2015 at 10:30 AM, 周千昊 <[email protected]> wrote: >>> >>>> As I mentioned, we don't want extra dependency because that will make >>>> the >>>> deployment more complex. >>>> As for Aurora, the users will have an extra step for installation. >>>> However >>>> so far, kylin will only need a war package and a hadoop cluster. >>>> On Tue Jan 13 2015 at 10:26:50 PM Ted Dunning <[email protected]> >>>> wrote: >>>> >>>>> I understand you want to write your own job engine. But why not use >>>> one >>>>> that already exists? >>>>> >>>>> Given that you mention quartz, it sounds like Aurora might be a good >>>> fit. >>>>> Why not use it? >>>>> >>>>> >>>>> >>>>> On Tue, Jan 13, 2015 at 3:34 AM, Zhou, Qianhao <[email protected]> >>>> wrote: >>>>> >>>>>> What we want is that: >>>>>> >>>>>> 1. A lightweight job engine, easy to start, stop and check jobs >>>>>> Because most of the heavyweight job is map-reduce which is >>>> already >>>>>> running on the cluster, so we don’t need the job engine to run >>>> on a >>>>>> cluster. >>>>>> >>>>>> 2. Kylin already has a job engine based on Quartz, however, only a >>>> very >>>>>> small >>>>>> part of functionalities are used, so we can easily replace it >>>> with >>>>>> standard java api. >>>>>> Thus there will be no extra dependency which means easier to >>>> deploy. >>>>>> >>>>>> Currently a very simple job engine implementation will meet the >>>> kylin’s >>>>>> needs. >>>>>> So I think at this timing just keep it simple would be the better >>>> choice. >>>>>> >>>>>> >>>>>> Best Regard >>>>>> Zhou QianHao >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 1/13/15, 4:43 PM, "Ted Dunning" <[email protected]> wrote: >>>>>> >>>>>>> So why are the following systems unsuitable? >>>>>>> >>>>>>> - mesos + (aurora or chronos) >>>>>>> - spark >>>>>>> - yarn >>>>>>> - drill's drillbits >>>>>>> >>>>>>> These options do different things. I know that. I am not entirely >>>>> clear >>>>>>> on what you want, however, so I present these different options so >>>> that >>>>>>> you >>>>>>> can tell me better what you want. >>>>>>> >>>>>>> Mesos provides very flexible job scheduling. With Aurora, it has >>>>> support >>>>>>> for handling long-running and periodic jobs. With Chronos, it has >>>> the >>>>>>> equivalent of a cluster level cron. >>>>>>> >>>>>>> Spark provides the ability for a program to spawn lots of parallel >>>>>>> execution. This is different than what most people mean by job >>>>>>> scheduling, >>>>>>> but in conjunction with a queuing system combined with spark >>>> streaming, >>>>>>> you >>>>>>> can get remarkably close to a job scheduler. >>>>>>> >>>>>>> Yarn can run jobs, but has no capabilities to schedule recurring >>>> jobs. >>>>> It >>>>>>> can adjudicate the allocation of cluster resources. This is >>>> different >>>>>>> from >>>>>>> what either spark or mesos does. >>>>>>> >>>>>>> Drill's drillbits do scheduling of queries across a parallel >>>> execution >>>>>>> environment. It currently has no user impersonation, but does do >>>> an >>>>>>> interesting job of scheduling parts of parallel queries. >>>>>>> >>>>>>> Each of these could be considered like a job scheduler. Only a >>>> very >>>> few >>>>>>> are likely to be what you are talking about. >>>>>>> >>>>>>> Which is it? >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >> >>
