Re: Proposal for new Job Engine

Adunuthula, Seshu Wed, 14 Jan 2015 14:03:01 -0800

Would it be possible to build an extensibility layer around the job engine
I.e. By default we refactor and build a simple job engine and not take
External dependencies, but allow for external job engines to be plugged in
to enable Scheduling and complex workflows…



On 1/14/15, 9:40 AM, "Julian Hyde" <[email protected]> wrote:

>Still worth considering an existing tool. The simplest code is the code
>you don’t maintain. :)
>
>On Jan 14, 2015, at 2:57 AM, Li Yang <[email protected]> wrote:
>
>> Sorry I'm late, just a recap.
>> 
>> The "Job Engine" here only manages long running tasks lifecycle and
>> dependencies, it oversees task sequences, like cube build is made up of
>> several mapreduces, and allow user to start/stop/pause/resume.
>> 
>> It does not do scheduling or fancy workflow, that's why many existing
>> products like quartz or oozie overkill. We want keep Kylin overall
>> architecture simple and be easy to deploy and debug.
>> 
>> The purpose of this refactoring is to separate the manager role and the
>> worker role which previous impl mixed up. Once done, replacing a worker
>> shall become easy. We will be free to explore other cube building
>>workers,
>> like Flink and Spark mentioned.
>> 
>> Cheers
>> Yang
>> 
>> On Wed, Jan 14, 2015 at 10:08 AM, Zhou, Qianhao <[email protected]>
>>wrote:
>> 
>>> Thanks Ted for the advice.
>>> I think the right way to do is to take more options into consideration,
>>> then make decision.
>>> Whichever solution is used, we are going to learn something that will
>>> benefit sooner or later.
>>> 
>>> Best Regard
>>> Zhou QianHao
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On 1/14/15, 12:37 AM, "Ted Dunning" <[email protected]> wrote:
>>> 
>>>> OK.
>>>> 
>>>> On Tue, Jan 13, 2015 at 10:30 AM, 周千昊 <[email protected]> wrote:
>>>> 
>>>>> As I mentioned, we don't want extra dependency because that will make
>>>>> the
>>>>> deployment more complex.
>>>>> As for Aurora, the users will have an extra step for installation.
>>>>> However
>>>>> so far, kylin will only need a war package and a hadoop cluster.
>>>>> On Tue Jan 13 2015 at 10:26:50 PM Ted Dunning <[email protected]>
>>>>> wrote:
>>>>> 
>>>>>> I understand you want to write your own job engine.  But why not use
>>>>> one
>>>>>> that already exists?
>>>>>> 
>>>>>> Given that you mention quartz, it sounds like Aurora might be a good
>>>>> fit.
>>>>>> Why not use it?
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Tue, Jan 13, 2015 at 3:34 AM, Zhou, Qianhao <[email protected]>
>>>>> wrote:
>>>>>> 
>>>>>>> What we want is that:
>>>>>>> 
>>>>>>> 1. A lightweight job engine, easy to start, stop and check jobs
>>>>>>>   Because most of the heavyweight job is map-reduce which is
>>>>> already
>>>>>>>   running on the cluster, so we don’t need the job engine to run
>>>>> on a
>>>>>>> cluster.
>>>>>>> 
>>>>>>> 2. Kylin already has a job engine based on Quartz, however, only a
>>>>> very
>>>>>>> small
>>>>>>>   part of functionalities are used, so we can easily replace it
>>>>> with
>>>>>>>   standard java api.
>>>>>>>   Thus there will be no extra dependency which means easier to
>>>>> deploy.
>>>>>>> 
>>>>>>> Currently a very simple job engine implementation will meet the
>>>>> kylin’s
>>>>>>> needs.
>>>>>>> So I think at this timing just keep it simple would be the better
>>>>> choice.
>>>>>>> 
>>>>>>> 
>>>>>>> Best Regard
>>>>>>> Zhou QianHao
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On 1/13/15, 4:43 PM, "Ted Dunning" <[email protected]> wrote:
>>>>>>> 
>>>>>>>> So why are the following systems unsuitable?
>>>>>>>> 
>>>>>>>> - mesos + (aurora or chronos)
>>>>>>>> - spark
>>>>>>>> - yarn
>>>>>>>> - drill's drillbits
>>>>>>>> 
>>>>>>>> These options do different things.  I know that.  I am not
>>>>>>>>entirely
>>>>>> clear
>>>>>>>> on what you want, however, so I present these different options so
>>>>> that
>>>>>>>> you
>>>>>>>> can tell me better what you want.
>>>>>>>> 
>>>>>>>> Mesos provides very flexible job scheduling.  With Aurora, it has
>>>>>> support
>>>>>>>> for handling long-running and periodic jobs.  With Chronos, it has
>>>>> the
>>>>>>>> equivalent of a cluster level cron.
>>>>>>>> 
>>>>>>>> Spark provides the ability for a program to spawn lots of parallel
>>>>>>>> execution.  This is different than what most people mean by job
>>>>>>>> scheduling,
>>>>>>>> but in conjunction with a queuing system combined with spark
>>>>> streaming,
>>>>>>>> you
>>>>>>>> can get remarkably close to a job scheduler.
>>>>>>>> 
>>>>>>>> Yarn can run jobs, but has no capabilities to schedule recurring
>>>>> jobs.
>>>>>> It
>>>>>>>> can adjudicate the allocation of cluster resources.  This is
>>>>> different
>>>>>>>> from
>>>>>>>> what either spark or mesos does.
>>>>>>>> 
>>>>>>>> Drill's drillbits do scheduling of queries across a parallel
>>>>> execution
>>>>>>>> environment.  It currently has no user impersonation, but does do
>>>>> an
>>>>>>>> interesting job of scheduling parts of parallel queries.
>>>>>>>> 
>>>>>>>> Each of these could be considered like a job scheduler.  Only a
>>>>> very
>>>>> few
>>>>>>>> are likely to be what you are talking about.
>>>>>>>> 
>>>>>>>> Which is it?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> 
>>> 
>

Re: Proposal for new Job Engine

Reply via email to