[
https://issues.apache.org/jira/browse/SINGA-11?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14594298#comment-14594298
]
wangwei edited comment on SINGA-11 at 6/20/15 2:21 AM:
-------------------------------------------------------
There are two types of "Recovery",
1. recovery the the whole system, e.g., continue the training from a previously
stopped step.
2. recovery of a single executor, e.g., one executor fails.
We will finish the first type of recovery in few days. The second type of
recovery is more complex and will be considered in the second release.
was (Author: wangwei.cs):
Yes.
There are two types of "Recovery",
1. recovery the the whole system, e.g., continue the training from a previously
stopped step.
2. recovery of a single executor, e.g., one executor fails.
We will finish the first type of recovery in few days. The second type of
recovery is more complex and will be considered in the second release.
> Start SINGA using Mesos
> -----------------------
>
> Key: SINGA-11
> URL: https://issues.apache.org/jira/browse/SINGA-11
> Project: Singa
> Issue Type: New Feature
> Reporter: wangwei
> Assignee: Anh Dinh
>
> Mesos helps to mange resources in large clusters.
> This ticket is an initial integration of SINGA with Mesos, which aims to
> simply start SINGA through Mesos and run multiple SINGA tasks in the same
> cluster.
> The fully integration should include,
> 1. start SINGA by Mesos, including requesting processes, memory, CPU, etc.
> 2. detect failures and recovery through Mesos
> 3. TBD.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)