Re: Proposal for new Job Engine

Julian Hyde Wed, 14 Jan 2015 09:42:41 -0800

Still worth considering an existing tool. The simplest code is the code you 
don’t maintain. :)


On Jan 14, 2015, at 2:57 AM, Li Yang <[email protected]> wrote:

> Sorry I'm late, just a recap.
> 
> The "Job Engine" here only manages long running tasks lifecycle and
> dependencies, it oversees task sequences, like cube build is made up of
> several mapreduces, and allow user to start/stop/pause/resume.
> 
> It does not do scheduling or fancy workflow, that's why many existing
> products like quartz or oozie overkill. We want keep Kylin overall
> architecture simple and be easy to deploy and debug.
> 
> The purpose of this refactoring is to separate the manager role and the
> worker role which previous impl mixed up. Once done, replacing a worker
> shall become easy. We will be free to explore other cube building workers,
> like Flink and Spark mentioned.
> 
> Cheers
> Yang
> 
> On Wed, Jan 14, 2015 at 10:08 AM, Zhou, Qianhao <[email protected]> wrote:
> 
>> Thanks Ted for the advice.
>> I think the right way to do is to take more options into consideration,
>> then make decision.
>> Whichever solution is used, we are going to learn something that will
>> benefit sooner or later.
>> 
>> Best Regard
>> Zhou QianHao
>> 
>> 
>> 
>> 
>> 
>> On 1/14/15, 12:37 AM, "Ted Dunning" <[email protected]> wrote:
>> 
>>> OK.
>>> 
>>> On Tue, Jan 13, 2015 at 10:30 AM, 周千昊 <[email protected]> wrote:
>>> 
>>>> As I mentioned, we don't want extra dependency because that will make
>>>> the
>>>> deployment more complex.
>>>> As for Aurora, the users will have an extra step for installation.
>>>> However
>>>> so far, kylin will only need a war package and a hadoop cluster.
>>>> On Tue Jan 13 2015 at 10:26:50 PM Ted Dunning <[email protected]>
>>>> wrote:
>>>> 
>>>>> I understand you want to write your own job engine.  But why not use
>>>> one
>>>>> that already exists?
>>>>> 
>>>>> Given that you mention quartz, it sounds like Aurora might be a good
>>>> fit.
>>>>> Why not use it?
>>>>> 
>>>>> 
>>>>> 
>>>>> On Tue, Jan 13, 2015 at 3:34 AM, Zhou, Qianhao <[email protected]>
>>>> wrote:
>>>>> 
>>>>>> What we want is that:
>>>>>> 
>>>>>> 1. A lightweight job engine, easy to start, stop and check jobs
>>>>>>   Because most of the heavyweight job is map-reduce which is
>>>> already
>>>>>>   running on the cluster, so we don’t need the job engine to run
>>>> on a
>>>>>> cluster.
>>>>>> 
>>>>>> 2. Kylin already has a job engine based on Quartz, however, only a
>>>> very
>>>>>> small
>>>>>>   part of functionalities are used, so we can easily replace it
>>>> with
>>>>>>   standard java api.
>>>>>>   Thus there will be no extra dependency which means easier to
>>>> deploy.
>>>>>> 
>>>>>> Currently a very simple job engine implementation will meet the
>>>> kylin’s
>>>>>> needs.
>>>>>> So I think at this timing just keep it simple would be the better
>>>> choice.
>>>>>> 
>>>>>> 
>>>>>> Best Regard
>>>>>> Zhou QianHao
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On 1/13/15, 4:43 PM, "Ted Dunning" <[email protected]> wrote:
>>>>>> 
>>>>>>> So why are the following systems unsuitable?
>>>>>>> 
>>>>>>> - mesos + (aurora or chronos)
>>>>>>> - spark
>>>>>>> - yarn
>>>>>>> - drill's drillbits
>>>>>>> 
>>>>>>> These options do different things.  I know that.  I am not entirely
>>>>> clear
>>>>>>> on what you want, however, so I present these different options so
>>>> that
>>>>>>> you
>>>>>>> can tell me better what you want.
>>>>>>> 
>>>>>>> Mesos provides very flexible job scheduling.  With Aurora, it has
>>>>> support
>>>>>>> for handling long-running and periodic jobs.  With Chronos, it has
>>>> the
>>>>>>> equivalent of a cluster level cron.
>>>>>>> 
>>>>>>> Spark provides the ability for a program to spawn lots of parallel
>>>>>>> execution.  This is different than what most people mean by job
>>>>>>> scheduling,
>>>>>>> but in conjunction with a queuing system combined with spark
>>>> streaming,
>>>>>>> you
>>>>>>> can get remarkably close to a job scheduler.
>>>>>>> 
>>>>>>> Yarn can run jobs, but has no capabilities to schedule recurring
>>>> jobs.
>>>>> It
>>>>>>> can adjudicate the allocation of cluster resources.  This is
>>>> different
>>>>>>> from
>>>>>>> what either spark or mesos does.
>>>>>>> 
>>>>>>> Drill's drillbits do scheduling of queries across a parallel
>>>> execution
>>>>>>> environment.  It currently has no user impersonation, but does do
>>>> an
>>>>>>> interesting job of scheduling parts of parallel queries.
>>>>>>> 
>>>>>>> Each of these could be considered like a job scheduler.  Only a
>>>> very
>>>> few
>>>>>>> are likely to be what you are talking about.
>>>>>>> 
>>>>>>> Which is it?
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> 
>>

Re: Proposal for new Job Engine

Reply via email to