[jira] [Comment Edited] (FLINK-1730) Add a FlinkTools.persist style method to the Data Set.

Kate Eri (JIRA) Mon, 13 Feb 2017 09:38:16 -0800

    [ 
https://issues.apache.org/jira/browse/FLINK-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15863650#comment-15863650
 ]


Kate Eri edited comment on FLINK-1730 at 2/13/17 5:36 PM:
----------------------------------------------------------

Hello [~StephanEwen], hello [~fhueske].
We would like to implement this ticket to enable [integration of Flink with 
SystemML|https://github.com/apache/incubator-systemml/pull/119#issuecomment-222059794].
Considering the previous discussion, we would like to design this feature first 
and discuss it here.

First of all, I would like to double check the main statements of this issue:
1.      Dataset should be persisted and cached in memory or spilled to disk, if 
memory is not enough and to avoid to job failure. 
2.      Persisted data sets should be used for failover recovery. If done on 
the network stack level, persistence could cover this requirement. 
3.      Data should be shared across jobs. Operators are expected to return 
their memory when a job is done otherwise this will be a memory leak. There is 
no way to free memory if the job is finished and did not do it.
4.      Because of GPU/CPU -> Offheap/heap memory management, persistence in 
cache is required for support of GPUs.

Do I get these right or something was lost? 



was (Author: kateri):
Hello [~StephanEwen], hello [~fhueske].
We would like to implement this ticket to enable integration of Flink with 
SystemML.
Considering the previous discussion, we would like to design this feature first 
and discuss it here.

First of all, I would like to double check the main statements of this issue:
1.      Dataset should be persisted and cached in memory or spilled to disk, if 
memory is not enough and to avoid to job failure. 
2.      Persisted data sets should be used for failover recovery. If done on 
the network stack level, persistence could cover this requirement. 
3.      Data should be shared across jobs. Operators are expected to return 
their memory when a job is done otherwise this will be a memory leak. There is 
no way to free memory if the job is finished and did not do it.
4.      Because of GPU/CPU -> Offheap/heap memory management, persistence in 
cache is required for support of GPUs.

Do I get these right or something was lost? 


> Add a FlinkTools.persist style method to the Data Set.
> ------------------------------------------------------
>
>                 Key: FLINK-1730
>                 URL: https://issues.apache.org/jira/browse/FLINK-1730
>             Project: Flink
>          Issue Type: New Feature
>          Components: DataSet API
>            Reporter: Stephan Ewen
>            Assignee: Evgeny Kincharov
>
> I think this is an operation that will be needed more prominently. Defining a 
> point where one long logical program is broken into different executions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (FLINK-1730) Add a FlinkTools.persist style method to the Data Set.

Reply via email to