[
https://issues.apache.org/jira/browse/MESOS-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628402#comment-13628402
]
Benjamin Mahler commented on MESOS-426:
---------------------------------------
I'm unfamiliar with mesos_submit but I took a quick look at mesos_submit.py
If the executor fails, the SubmitScheduler will get a statusUpdate() for the
task. It can then re-schedule it accordingly.
As for scheduler failure, that requires some distributed persistence. The most
common ways that frameworks have implemented distributed persistence are
through:
-Through the replicated log built by mesos (benh can advise further)
-Through the state abstraction also built by mesos (benh can advise further)
-Through some other form of distributed persistence
My question here is why you'd like to handle scheduler failure for
mesos_submit, it will need to run itself on more than one machine which might
be outside the scope of the tool. I think for now you should focus on handling
task / executor failures.
> Python-based frameworks use old API and are broken
> --------------------------------------------------
>
> Key: MESOS-426
> URL: https://issues.apache.org/jira/browse/MESOS-426
> Project: Mesos
> Issue Type: Bug
> Components: framework, python-api
> Affects Versions: 0.9.0
> Reporter: David Greenberg
>
> If you try to use mesos-submit or torque with mesos 0.9.0+, you get
> exceptions due to API mismatches in these framework's expectations of the
> python API.
> Steps to reproduce: try running "mesos-submit <mymaster> echo hi", note the
> stacktraces.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira