Why is your executor "failing"? When you say failing, is your executor crashing or simply exiting after doing the required work?
You will need to manage the task status lifecycle. If your executor is holding non-terminal tasks and it exits, the slave will report these tasks as LOST since it does not know whether the tasks were run to completion. Your executor will at the very least need to report when things are FINISHED or FAILED. It's also good practice to report once things are RUNNING to keep your scheduler well informed. Hope this helps, Ben On Mon, Apr 7, 2014 at 11:35 AM, David Greenberg <[email protected]>wrote: > I'm working on porting my executor from the CommandExecutor to a custom > executor, in order to take advantage of other features of Mesos. I started > by changing the TaskInfo in the scheduler to define ExecutorInfo instead of > CommandInfo, where the ExecutorInfo's command is the same as the original > CommandInfo. I gave the executor a random ID. > > I can see that the executor successfully starts and seems to connect to > Mesos. After a few moments (10s - 100s of ms), the executor fails with the > LOST status. > > Am I responsible for explicitly managing the TaskState lifecycle of the > executor? That is, do I need to immediately send the TASK_STARTING status > update, and then send the TASK_RUNNING update once the task has begun? Are > there any heartbeats that I'm responsible for? > > Thanks, > David >
