- The ERROR state suggests an error with the task framework while TASK_ERROR 
suggests an error with the task.
- I've added a note on pause/resume.
- Right now, yes, it is up to the implementation. It was probably done this way 
to allow the task a chance to clean up. Perhaps we can add a timeout, at which 
point we can force termination.
- The current API is intended to be done through YAML and submitted via command 
line. If you want to do this with Java code, then there are builders for 
workflows and tasks; see Workflow.Builder and TaskConfig.Builder. I looked at 
it, and it seems reasonable and extensible. There are definitely a couple 
changes that can be made to make it easier to use, but it doesn't seem like a 
huge effort.

Kanak

----------------------------------------
> Date: Wed, 26 Mar 2014 14:40:26 -0700
> Subject: Re: Scheduling tasks in the cluster
> From: [email protected]
> To: [email protected]
>
> Awesome work.
> Feedback/comments
>
> Task state model
> -- what is the difference between two error states.
> -- what about pause/resume.
>
> How does cancel work, is it up to the implementation to signal the thread
> and return from run method ?. Is Helix depending on run method to be
> removed.
>
> Do we have the api for setting up the task flow,
>
> thanks,
> kishore G
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Sun, Mar 23, 2014 at 5:36 PM, Kanak Biscuitwala <[email protected]>wrote:
>
>>
>> Hi,
>>
>> I wrote up a page describing how we might go about doing scheduled tasks.
>> It largely leverages the existing task framework and adds a scheduling
>> layer on top of it.
>>
>> https://cwiki.apache.org/confluence/display/HELIX/Scheduled+Tasks
>>
>> Any feedback is appreciated.
>>
>> Thanks,
>> Kanak
>>
>> ----------------------------------------
>>> Date: Tue, 18 Mar 2014 20:29:21 -0700
>>> Subject: Re: Scheduling tasks in the cluster
>>> From: [email protected]
>>> To: [email protected]
>>>
>>> I like the idea what we should do is have the concept of a Task
>>> Manager with apis to execute tasks immediately or after a specific
>>> duration or periodically. I think we can absolutely put together an
>>> API for this, with synchronous responses and fire-and-forget with a
>>> callback semantics.
>>>
>>> The tricky part is persistence since we need to make sure they can be
>>> pulled into memory right before they are to be scheduled etc.
>>>
>>> But all in all would be a good addition.
>>>
>>> Sandeep
>>>
>>> On Tue, Mar 18, 2014 at 5:14 PM, Kanak Biscuitwala <[email protected]>
>> wrote:
>>>>
>>>> I'll send out a longer email once I've finished gathering requirements
>> and sketching through a design, but here are my initial thoughts:
>>>>
>>>> - This actually requires two things from Helix: being able to run tasks
>> in the cluster reliably and being able to schedule tasks in the cluster
>> reliably
>>>> - For the task half of this work, we probably have most of the code
>> available already as the task framework supports things like target
>> resources, DAG-based dependencies, task states, canceling, and correctness
>> in the face of controller failover.
>>>> - The scheduling half is the part that requires the most new additions.
>> We basically need to be able to (1) store the schedule, (2) know when to
>> wake up to process an item on the schedule, and (3) do this without needing
>> anything in controller memory
>>>>
>>>> Kanak
>>>>
>>>> ----------------------------------------
>>>>> Date: Tue, 18 Mar 2014 16:09:05 -0700
>>>>> Subject: Scheduling tasks in the cluster
>>>>> From: [email protected]
>>>>> To: [email protected]
>>>>>
>>>>> This requirement has come up often and I think its worth while to spend
>>>>> some time to come up with an elegant solution. We have offered work
>> around
>>>>> but it still requires users to write write quite a bit of complex code
>>>>>
>>>>> Problem statement:
>>>>> Schedule a Task(s) in the cluster. The task can be Adhoc (one time) or
>>>>> Recurring (every X minutes or once between 12 to 3 AM etc - basically a
>>>>> cron expression). Additional criteria as to where the task should be
>> run,
>>>>> it can be run on any node in the cluster or any node in that cluster
>> that
>>>>> hosts a particular resource and in a particular state. If the task
>> fails we
>>>>> might have to retry the task, it can either retry x times before
>> trying on
>>>>> another node etc. There might be additional constraints that not more
>> than
>>>>> X tasks should be run on a particular node or across the entire
>> cluster.
>>>>>
>>>>> Helix supports all these features in one way or the other but there is
>> no
>>>>> first class support of API that encapsulates all the above features.
>>>>>
>>>>> Any thoughts on how such an API/DSL should look like ?
>>>>>
>>>>> thanks,
>>>>> Kishore G
>>>>
>>
>>
                                          

Reply via email to