> On Oct. 7, 2014, 6:50 p.m., Vinod Kone wrote:
> > OK. Here is a proposal for what it could look like.
> >
> >
> > General idea: We should add as few top level task states as possible
> > because it is more work for frameworks. TASK_LOST should be used for cases
> > where we expect a relaunch of the task would succeed (unfortunately this
> > principle breaks with reconciliation).
> >
> >
> > Add 2 new task states to TaskState
> > enum TaskState {
> > ...
> > ...
> > ...,
> > TASK_UNAUTHORIZED, # Fold this into TASK_INVALID?
> > TASK_INVALID # Maybe use TASK_ERROR instead since it already exists but
> > unused?
> > }
> >
> >
> > We add 2 new fields, "source" and "reason"/"code" both enums, to TaskStatus
> >
> > NOTE: We should take this opportunity to move task validations from
> > scheduler driver to master, to simplify. Maybe do this as first patch.
> >
> > enum Source {
> > MASTER,
> > SLAVE,
> > EXECUTOR,
> > SCHEDULER, # Don't need this when we move validation to master.
> > }
> >
> > Based on the different status updates, these are the reasons i came up
> > with. Let me know if you can't figure out which reason should be used where
> > :)
> >
> > enum Reason {
> > # Set by master
> > INVALID_OFFERS,
> > SLAVE_REMOVED,
> > SLAVE_DISCONNECTED,
> > SLAVE_UKNOWN,
> > TASK_UNKNOWN,
> >
> > # Set by scheduler driver for now. But we could kill this and expect
> > scheduler to not send launch tasks when it is disconnected?
> > MASTER_DISCONNECTED,
> >
> > # Set by slave
> > GC_ERROR,
> > SLAVE_RESTARTED
> > EXECUTOR_TERMINATED,
> > }
> >
> > Currently the "Reason" make sense for LOST updates generated by
> > master/slave. Executors might use this code for udpates they generate, but
> > it is upto the framework on how to interpret it. We could also consider
> > adding more reasons for TASK_INVALID/TASK_ERROR which is also generated by
> > master (e.g., TASK_UNAUTHORIZED could be a reason for TASK_INVALID).
>
> Bill Farner wrote:
> This looks good; i have one addendum: frameworks must not be allowed to
> set status update fields in ways that conflict with the master/slave. i.e.
> an executor should not be allowed to specify the `Source` (or if it does,
> mesos should overwrite it).
>
> Vinod Kone wrote:
> yup. that definitely was on my mind :)
Looks good to me. We can also add `TASK_FAILED` reasons and `TASK_KILLED`
explanations to the `Reason` enum. Generally, my proposal is to use `Reason`
for all second-tier states.
- Alexander
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26382/#review55668
-----------------------------------------------------------
On Oct. 9, 2014, 1:18 a.m., Dominic Hamon wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26382/
> -----------------------------------------------------------
>
> (Updated Oct. 9, 2014, 1:18 a.m.)
>
>
> Review request for mesos and Vinod Kone.
>
>
> Bugs: MESOS-1, MESOS-1830 and MESOS-343
> https://issues.apache.org/jira/browse/MESOS-1
> https://issues.apache.org/jira/browse/MESOS-1830
> https://issues.apache.org/jira/browse/MESOS-343
>
>
> Repository: mesos-git
>
>
> Description
> -------
>
> Annotating every TASK_LOST with comments to open discussion.
>
> If we add a 'source' field and consider adding TASK_INVALID i think it adds
> much more information. I don't think the metrics would have to change as the
> source matches the source file, I think. Unless I missed a subtlety. Ie, some
> of the master TASK_LOST could be set to slave source, but i think it's
> debatable.
>
>
> Diffs
> -----
>
> src/master/master.cpp f05275b00635cee82007ed851bba1cd30a7aa74f
> src/sched/sched.cpp a37ed3d2e11035650b9bf0440fb87f66669129d8
> src/scheduler/scheduler.cpp fb88a3e029e97ba33eae5d50503be5ed9c9533e6
> src/slave/slave.cpp e56dcbd80114730949a0d4b553470802a4d38281
>
> Diff: https://reviews.apache.org/r/26382/diff/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Dominic Hamon
>
>