ASFBot was unavailable, so here is a copy-paste meeting log:
23:09 osgigeek I submit tasks to a task management interface whatever we choose 23:10 osgigeek those tasks need to be executed as soon as possible 23:10 osgigeek and I should get a response 23:10 kishoreg1 they will be 23:10 kishoreg1 thats exactly the behavior 23:10 kanakb osgigeek, perhaps i should have done a better job separating api from implementation details 23:10 kanakb but that is exactly what this does 23:11 kanakb the api is exactly what you're suggesting 23:11 osgigeek ok but I think I am lost when we say 23:11 osgigeek controller looks at the the resource and the tasks and assigns the tasks to nodes in the cluster 23:11 osgigeek why are the tasks inherently distributed? 23:12 kanakb depends on your policy 23:12 osgigeek ah ok 23:12 kanakb if you *want* stuff to run on specific nodes, we can make that happen 23:12 kanakb if you just want your task to run somewhere that has capacity 23:12 kanakb then we can do that too 23:12 osgigeek right 23:13 osgigeek so I am thinking of that specific usecase 23:13 osgigeek I have it today 23:13 osgigeek I want to execute tasks on a given node and iff it does not have capacity move it to a node which does 23:13 osgigeek if you are saying I can do it with a policy then that works 23:13 osgigeek because I may want the tasks to be executed locally 23:13 osgigeek I may not want them distributed 23:14 kanakb if you want to run the task on the same machine as the client 23:14 kanakb using helix probably doesn't make much sense 23:14 osgigeek the benefit I see is helix will give me the ability to failover to another node if my tasks are queued and the node fails 23:14 osgigeek I dont have to manage that myself 23:15 osgigeek Helix can also maybe manage execution based on capacity 23:15 kishoreg1 there is a whole lot more than that 23:15 kishoreg1 correct 23:15 kishoreg1 and with integration with yarn/ec2 etc 23:15 osgigeek so yes I agree there is a whole lot more than that I am not trivializing the issue 23:15 kishoreg1 we can support SLA 23:16 kishoreg1 u can say run these tasks, i need it to be completed in X hours 23:16 kishoreg1 so now helix can run the tasks, monitor progress and go ask EC2 for more instances 23:17 kishoreg1 and start the tasks there 23:17 kishoreg1 and release EC2 instances after its done 23:18 osgigeek sure 23:18 kishoreg1 so yeah, in general the problems are initial distribution, handling faults, and scaling up/down 23:18 osgigeek but distributing tasks without a policy seems like an optimization 23:18 kishoreg1 these can be applied to either resources or tasks 23:19 kishoreg1 yeah, depends on the use case. some have policy 23:19 kishoreg1 most use cases will simply say run these tasks 23:19 osgigeek I think the separation of resource and task makes sense in my head but maybe not to you guys 23:19 kishoreg1 i dont care where it runs 23:20 osgigeek ok lets keep moving I dont want to stop progress 23:20 kishoreg1 it does have a difference 23:20 kishoreg1 may be u can help us in the terminology there 23:20 osgigeek sure 23:21 osgigeek so one question related to tasks 23:21 osgigeek can I say execute these at midnight every day? 23:21 kishoreg1 basically resource is an entity that is distributed its not transient 23:21 osgigeek a resource is also not executable and does not have results 23:21 kishoreg1 u want the resource to always exist and in a particular state 23:21 kishoreg1 correct 23:21 kanakb yes, recurring tasks at a given time a use case we want to support with the scheduling work 23:22 kishoreg1 task on the other hand is something that can be executed 23:22 osgigeek correct 23:22 osgigeek that is why the separation 23:22 kishoreg1 and is executed probably once 23:22 osgigeek no task can be executed repeatedly 23:22 osgigeek if its recurring 23:23 kishoreg1 dont know if concepts of partition/replica makes sense 23:23 osgigeek say something like execute this every 4 hrs 23:23 kishoreg1 yeah i take that back 23:23 kishoreg1 so shud we make task as first class citizen 23:23 osgigeek I think state machine wont make sense for tasks 23:24 kishoreg1 it does 23:24 kishoreg1 task has states 23:24 osgigeek what states does a task go through? 23:24 kanakb see the diagram on the wiki page 23:24 kishoreg1 init, start, stop, pause, resume, cancel 23:25 osgigeek so how will we pause a running thread/task? 23:25 kishoreg1 we will call a method call pause on the task 23:26 kishoreg1 user will have to implement that method to stop doing what ever it was doing 23:26 osgigeek ok I guess it does for recurring tasks 23:26 osgigeek ok 23:26 osgigeek so tasks have states : check 23:27 osgigeek what about partition and replica 23:27 osgigeek I think replica makes sense for failover 23:27 osgigeek issue is removing it when the task is cancelled or completed 23:27 osgigeek or errored 23:28 kanakb so this is actually a problem that needs some thought 23:29 kanakb we definitely need partition-level granularity for some use cases 23:29 kanakb and the callbacks are already partition-level 23:30 osgigeek btw one other difference between tasks and resources is users dont get to model the state model for tasks, they are predefined. Am I correct? 23:30 kanakb yeah that's all abstracted 23:31 osgigeek right I was trying to see if there are enough differences to require defining task as a new top level 23:31 osgigeek or a first class citizen 23:31 kishoreg1 yeah, its probably better to have it 23:33 kishoreg1 other extreme approach is get rid of resource 23:33 kishoreg1 and call everything as Task 23:33 kishoreg1 but have attributes on Task 23:34 kishoreg1 longRunning, shortLive 23:34 osgigeek I think resource has its place for sure I think its cleaner to keep them both 23:35 kishoreg1 ok, 23:35 osgigeek its hard to explain to people that think of database as a Task 23:35 osgigeek people are used to thinking of those as resources 23:35 kishoreg1 so we will have resource, partition, replica 23:35 kishoreg1 task, subtask 23:36 kanakb what about workflows? 23:36 osgigeek do we need a subtask, why cant we model it as a task itself? 23:36 osgigeek a task with a parent task? 23:36 kishoreg1 we need a logical grouping of multiple tasks 23:37 kishoreg1 probably a taskqueue, task? 23:37 hsaputra as DAG ? 23:37 osgigeek TaskCollection? 23:37 kanakb taskqueue is probably not correct because queue implies sequential execution 23:38 osgigeek yeah I am with kanakb: on that 23:38 osgigeek It implies order 23:38 kishoreg1 hsaputra: DAG is above taskcollection 23:39 kishoreg1 for example indexing 100 files under a directory 23:39 hsaputra ok ... 23:39 kishoreg1 we will have 100 tasks 23:40 kishoreg1 but we want refer to them with a logical name 23:40 kishoreg1 thats what we are trying to come up with a name 23:40 kishoreg1 taskgroup, taskcollection 23:40 hsaputra Tasks 1-to-m of Task instances ? 23:41 kishoreg1 yes 23:41 osgigeek TaskGroup is probably better 23:41 osgigeek Ok I like TaskGroup 23:41 kishoreg1 ok. 23:42 osgigeek so should the user be able to attach listener to TaskGroup as well as Task? 23:42 kishoreg1 yes 23:42 osgigeek kanakb: what was the concept of workflow? 23:43 kishoreg1 just like you can listen to resource as well as partition 23:43 kanakb workflow is taskgroups organized into a DAG, enforcing execution order 23:43 kishoreg1 workflow simply defines the order in which tasks shud be run 23:43 osgigeek the user defines that? 23:43 kanakb yes 23:43 kishoreg1 yes 23:44 osgigeek So a workflow traditionally supports branching 23:44 osgigeek do we have that ? 23:44 osgigeek fork and join? 23:45 kanakb conditional execution is something we can add later 23:45 osgigeek ok so its a sequential and a single branch 23:46 kanakb sequential with respect to parents 23:46 kanakb you can have 0 to multiple parents 23:46 kanakb at arbitrary depth 23:46 osgigeek multiple parents implies join 23:46 kanakb right so that is supported 23:46 osgigeek ok 23:46 kanakb as is fork, i guess 23:46 osgigeek yes 23:47 osgigeek so we will then need an api to allow the user to express this workflow 23:47 kanakb that's YAML right now 23:47 osgigeek but we want to add a java one too? 23:47 kishoreg1 but yeah we need an api as well 23:47 kanakb yeah so right now it's based on a builder 23:47 kanakb that takes tasks 23:47 kanakb that's pretty static 23:48 kanakb ideally it would be nice to make this more dynamic 23:48 kanakb i could say the same thing about taskgroups actually 23:48 osgigeek I am thinking we should introduce branch maybe 23:48 kishoreg1 lets start static 23:48 kishoreg1 whats a branch 23:48 osgigeek or we could do this whole thing as events 23:49 osgigeek so bear with me on this branch attempt 23:50 osgigeek so a branch contains tasks 23:50 osgigeek and we fork branches and join them 23:50 osgigeek to make up workflows 23:50 osgigeek so branch1.fork(branch2, branch3) 23:51 osgigeek branch1.add(task1).add(task2) 23:51 osgigeek and so on 23:51 osgigeek a branch inherently executes tasks in order 23:52 osgigeek fork essentially allows distribution 23:52 osgigeek across nodes 23:52 osgigeek branch1 to node1 23:52 osgigeek sorry branch2 to node 1 and branch3 to node2 23:52 kanakb this is currently expressed as just specifying the parent tasks for each task 23:52 kanakb isn't that just as expressive? 23:54 kanakb it also has the benefit of allowing helix to construct the graph for you instead of trying to specify it all up front 23:54 osgigeek sorry I have not seen the current expression so dont know if its as good 23:54 osgigeek so the more I think about this I think the more it makes sense to loosely couple them 23:55 osgigeek each task should have task state events 23:55 kishoreg1 yes, i would love keep them loosely couple 23:55 osgigeek and other tasks should simply listen to those events 23:55 kishoreg1 helix controller does not even understand the fork join concepts 23:55 osgigeek so T1 and T2 23:55 osgigeek T1 has states already all we need to do is trigger events 23:56 kishoreg1 :) 23:56 osgigeek so if T2 depends on T1 and say T0 to complete it listens to the complete events for T1 and T0 23:56 kishoreg1 u basically explained how we have implemented 23:56 osgigeek lol 23:56 kishoreg1 actually that was our first implementation 23:57 osgigeek but that way we can keep them loosely coupled 23:57 kishoreg1 the thing is who listens 23:57 osgigeek and people can add tasks anytime 23:57 osgigeek a task listens, we only need to get what event-type it listens to of which task 23:57 osgigeek the task event carries the event-type and task-id 23:58 osgigeek or task name 23:58 kishoreg1 is it onus of T1 to listen to T0 complete before T1 starts 23:58 osgigeek so there should be some entity orchestrating the listening and triggering of T1 23:58 kishoreg1 or controller listens to T0 and starts T1 after T0 has reached complete state 23:58 osgigeek some manager or maybe like you say controller 23:58 kishoreg1 correct, thats what we are doing 23:58 osgigeek but the user expresses the listen events on the task 23:58 kishoreg1 right 23:58 osgigeek so the user should think that the task is listening 23:59 osgigeek that I think allows pluggablility of tasks 23:59 kishoreg1 right, i think its exactly the way u described 00:00 kishoreg1 lets wait for kanak to finish his work and see a working demo 00:00 osgigeek mind you I have not looked at any existing implementation 00:00 osgigeek so it just validates your design I think 00:01 kishoreg1 yep 00:01 kishoreg1 ok its 12:00 00:01 kishoreg1 shud we wrap up? 00:01 kishoreg1 i badly wanted to talk about admin api 00:01 osgigeek yes so what have we decided can we recap? 00:01 osgigeek Tasks are first class citizens 00:01 kishoreg1 we will have taskgroup and task as first class citizen 00:02 osgigeek Tasks are grouped into TaskGroups 00:02 kishoreg1 so we will have taskconfiguration 00:02 kishoreg1 we need better way to differentiate between task and resource 00:03 kanakb if taskgroups are first class citizens, where does workflow fit in 00:03 kishoreg1 workflow is just a user concept 00:03 kishoreg1 helix does not understand it internally 00:03 kishoreg1 a task as entryCriteria 00:03 kishoreg1 or exitCriteria 00:04 kanakb ok 00:04 kishoreg1 that expresses the dependency triggers 00:04 kanakb a task, not a task group, correct? 00:04 kishoreg1 it can be at both levels 00:04 osgigeek yeah I think both levels makes sense 00:05 osgigeek an entire group could be cancelled if a criteria is met 00:05 kishoreg1 correct, 00:05 kishoreg1 so we can have tasks be long running? 00:06 kishoreg1 for example shud a stream processing job be represented as task 00:06 kishoreg1 or resource 00:06 kanakb probably resource 00:06 kanakb i don't know though 00:06 osgigeek I think task 00:06 kishoreg1 i think sandeep will say it shud be task 00:06 osgigeek lol 00:06 osgigeek yes its a task 00:06 osgigeek a long running one albeit without timeout 00:07 kanakb basically what that implies is that some tasks are not completable 00:07 kishoreg1 i agree, its just that we currently model them as a resource 00:07 kishoreg1 yeah, so we can have them as attribute on the task configuration 00:09 kishoreg1 ok anything else 00:09 osgigeek callbacks 00:09 osgigeek callbacks can be on task as well as taskgroup 00:09 kishoreg1 we need to have api's for that 00:09 osgigeek does a task group have an id? 00:09 osgigeek or only tasks have ids? 00:09 kishoreg1 yep 00:10 kishoreg1 taskgroup will have id 00:10 kishoreg1 some name 00:10 osgigeek yes I agree 00:10 osgigeek yes 00:10 osgigeek ok I hate to do this so late but what about priority? 00:10 osgigeek task priority? 00:10 osgigeek or even task group priority? 00:10 kishoreg1 we will need that, good point 00:10 kishoreg1 i overlooked that 00:11 kishoreg1 by the task can have replicas rt 00:11 osgigeek yes 00:11 kishoreg1 is replica the right word? 00:12 osgigeek hmm 00:12 osgigeek I cant think of any other right now 00:13 kishoreg1 taskredundancy ? 00:15 osgigeek yeah the concept is that, but as a term does not fit nicely 00:15 kishoreg1 lets go with replica 00:15 osgigeek yeah lets go with replica 00:15 kishoreg1 so state model still applies 00:15 osgigeek btw tasks have a temporal nature to them 00:15 osgigeek yes 00:15 kishoreg1 we can say one task is leader 00:15 osgigeek tasks *may have a temporal nature to them 00:16 osgigeek sure 00:16 kishoreg1 other task is in standby or something like that 00:16 kishoreg1 temporal? 00:16 osgigeek ah standby is a nicer name 00:16 osgigeek temporal = some time element 00:16 kishoreg1 yep 00:16 osgigeek like execute this at 12 midnight 00:16 kishoreg1 yes definitely thats our first use case 00:16 osgigeek tasks can be repetitive or one shot 00:17 osgigeek if repetitive user defines how many times 00:18 osgigeek how many times could be of three types (1) repeate n times or (2) repeat every 'n' hours or (3) every 'n' hours until end-date 00:19 hsaputra have to leave early guys, will try to catch up with the chat summary, have a good night 00:20 kanakb thanks for joining in 00:20 kanakb good night 00:20 hsaputra has left IRC (Quit: Page closed) 00:20 kishoreg1 thanks 00:20 kishoreg1 take care 00:21 osgigeek anyway that should be all about tasks 00:23 kanakb ok 00:23 kanakb should we leave admin for another day? 00:23 osgigeek lets try to do it tomorrow 00:24 osgigeek I will be online, whenever you guys come on we can try to tackle it 00:24 kanakb ok 00:24 osgigeek tomorrow evening I am thinking just to clarify 00:24 kanakb sure 00:25 kanakb ok, let's end here then 00:25 kanakb thanks guys 00:25 osgigeek ok sounds good 00:25 osgigeek thank you 00:31 osgigeek has left ()
