LI Guobao created SYSTEMML-2421:
-----------------------------------
Summary: Task error and preemption handles
Key: SYSTEMML-2421
URL: https://issues.apache.org/jira/browse/SYSTEMML-2421
Project: SystemML
Issue Type: Sub-task
Reporter: LI Guobao
Assignee: LI Guobao
It aims to introduce the checkpointing to guarantee that the task could recover
from failure. In details, once a worker is brought up it pulls the current
state of the model. And the checkpointing could be set to be EPOCH10 which
means that every 10 epoch the state will be persisted in a file.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)