[
https://issues.apache.org/jira/browse/FLINK-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15863650#comment-15863650
]
Kate Eri edited comment on FLINK-1730 at 2/13/17 5:36 PM:
----------------------------------------------------------
Hello [~StephanEwen], hello [~fhueske].
We would like to implement this ticket to enable [integration of Flink with
SystemML|https://github.com/apache/incubator-systemml/pull/119#issuecomment-222059794].
Considering the previous discussion, we would like to design this feature first
and discuss it here.
First of all, I would like to double check the main statements of this issue:
1. Dataset should be persisted and cached in memory or spilled to disk, if
memory is not enough and to avoid to job failure.
2. Persisted data sets should be used for failover recovery. If done on
the network stack level, persistence could cover this requirement.
3. Data should be shared across jobs. Operators are expected to return
their memory when a job is done otherwise this will be a memory leak. There is
no way to free memory if the job is finished and did not do it.
4. Because of GPU/CPU -> Offheap/heap memory management, persistence in
cache is required for support of GPUs.
Do I get these right or something was lost?
was (Author: kateri):
Hello [~StephanEwen], hello [~fhueske].
We would like to implement this ticket to enable integration of Flink with
SystemML.
Considering the previous discussion, we would like to design this feature first
and discuss it here.
First of all, I would like to double check the main statements of this issue:
1. Dataset should be persisted and cached in memory or spilled to disk, if
memory is not enough and to avoid to job failure.
2. Persisted data sets should be used for failover recovery. If done on
the network stack level, persistence could cover this requirement.
3. Data should be shared across jobs. Operators are expected to return
their memory when a job is done otherwise this will be a memory leak. There is
no way to free memory if the job is finished and did not do it.
4. Because of GPU/CPU -> Offheap/heap memory management, persistence in
cache is required for support of GPUs.
Do I get these right or something was lost?
> Add a FlinkTools.persist style method to the Data Set.
> ------------------------------------------------------
>
> Key: FLINK-1730
> URL: https://issues.apache.org/jira/browse/FLINK-1730
> Project: Flink
> Issue Type: New Feature
> Components: DataSet API
> Reporter: Stephan Ewen
> Assignee: Evgeny Kincharov
>
> I think this is an operation that will be needed more prominently. Defining a
> point where one long logical program is broken into different executions.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)