I like the idea what we should do is have the concept of a Task
Manager with apis to execute tasks immediately or after a specific
duration or periodically. I think we can absolutely put together an
API for this, with synchronous responses and fire-and-forget with a
callback semantics.

The tricky part is persistence since we need to make sure they can be
pulled into memory right before they are to be scheduled etc.

But all in all would be a good addition.

Sandeep

On Tue, Mar 18, 2014 at 5:14 PM, Kanak Biscuitwala <[email protected]> wrote:
>
> I'll send out a longer email once I've finished gathering requirements and 
> sketching through a design, but here are my initial thoughts:
>
> - This actually requires two things from Helix: being able to run tasks in 
> the cluster reliably and being able to schedule tasks in the cluster reliably
> - For the task half of this work, we probably have most of the code available 
> already as the task framework supports things like target resources, 
> DAG-based dependencies, task states, canceling, and correctness in the face 
> of controller failover.
> - The scheduling half is the part that requires the most new additions. We 
> basically need to be able to (1) store the schedule, (2) know when to wake up 
> to process an item on the schedule, and (3) do this without needing anything 
> in controller memory
>
> Kanak
>
> ----------------------------------------
>> Date: Tue, 18 Mar 2014 16:09:05 -0700
>> Subject: Scheduling tasks in the cluster
>> From: [email protected]
>> To: [email protected]
>>
>> This requirement has come up often and I think its worth while to spend
>> some time to come up with an elegant solution. We have offered work around
>> but it still requires users to write write quite a bit of complex code
>>
>> Problem statement:
>> Schedule a Task(s) in the cluster. The task can be Adhoc (one time) or
>> Recurring (every X minutes or once between 12 to 3 AM etc - basically a
>> cron expression). Additional criteria as to where the task should be run,
>> it can be run on any node in the cluster or any node in that cluster that
>> hosts a particular resource and in a particular state. If the task fails we
>> might have to retry the task, it can either retry x times before trying on
>> another node etc. There might be additional constraints that not more than
>> X tasks should be run on a particular node or across the entire cluster.
>>
>> Helix supports all these features in one way or the other but there is no
>> first class support of API that encapsulates all the above features.
>>
>> Any thoughts on how such an API/DSL should look like ?
>>
>> thanks,
>> Kishore G
>

Reply via email to