[ 
https://issues.apache.org/jira/browse/MESOS-426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13628402#comment-13628402
 ] 

Benjamin Mahler commented on MESOS-426:
---------------------------------------

I'm unfamiliar with mesos_submit but I took a quick look at mesos_submit.py

If the executor fails, the SubmitScheduler will get a statusUpdate() for the 
task. It can then re-schedule it accordingly.

As for scheduler failure, that requires some distributed persistence. The most 
common ways that frameworks have implemented distributed persistence are 
through:
  -Through the replicated log built by mesos (benh can advise further)
  -Through the state abstraction also built by mesos (benh can advise further)
  -Through some other form of distributed persistence

My question here is why you'd like to handle scheduler failure for 
mesos_submit, it will need to run itself on more than one machine which might 
be outside the scope of the tool. I think for now you should focus on handling 
task / executor failures.
                
> Python-based frameworks use old API and are broken
> --------------------------------------------------
>
>                 Key: MESOS-426
>                 URL: https://issues.apache.org/jira/browse/MESOS-426
>             Project: Mesos
>          Issue Type: Bug
>          Components: framework, python-api
>    Affects Versions: 0.9.0
>            Reporter: David Greenberg
>
> If you try to use mesos-submit or torque with mesos 0.9.0+, you get 
> exceptions due to API mismatches in these framework's expectations of the 
> python API.
> Steps to reproduce: try running "mesos-submit <mymaster> echo hi", note the 
> stacktraces.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to