LI Guobao created SYSTEMML-2421:
-----------------------------------

             Summary: Task error and preemption handles
                 Key: SYSTEMML-2421
                 URL: https://issues.apache.org/jira/browse/SYSTEMML-2421
             Project: SystemML
          Issue Type: Sub-task
            Reporter: LI Guobao
            Assignee: LI Guobao


It aims to introduce the checkpointing to guarantee that the task could recover 
from failure. In details, once a worker is brought up it pulls the current 
state of the model. And the checkpointing could be set to be EPOCH10 which 
means that every 10 epoch the state will be persisted in a file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to