[ 
https://issues.apache.org/jira/browse/FLINK-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14736587#comment-14736587
 ] 

ASF GitHub Bot commented on FLINK-1730:
---------------------------------------

Github user StephanEwen commented on the pull request:

    https://github.com/apache/flink/pull/1083#issuecomment-138861709
  
    I don't think this really works. It violates so many assumptions, like the 
fact that memory is available after a job ends. The accounting for that depends 
on task slots - a free slot must have the necessary memory.
    
    When you call collect twice, the whole program get re-executed and some 
parts are simply discarded for the sake of using the persisted data. That makes 
sure the results are the same, but does not safe any work. For that, you need 
backtracking through the graph to check whether results are already available.
    
    Which brings us to the point:
    
    Making the system's level of data streams aware of this (both on 
TaskManager and JobManager side) solves two issues with one approach: Caching 
for reuse in incrementally constructed programs, and caching for recovery. Both 
cases simply need persistent streams and backtracking in the execution graph.
    
    Sorry that you made this effort and it cannot be merged now, but I do not 
understand why you keep ignoring the discussions, inputs, and the request to 
plan and describe such highly involved features before writing them.


> Add a FlinkTools.persist style method to the Data Set.
> ------------------------------------------------------
>
>                 Key: FLINK-1730
>                 URL: https://issues.apache.org/jira/browse/FLINK-1730
>             Project: Flink
>          Issue Type: New Feature
>            Reporter: Stephan Ewen
>            Priority: Minor
>
> I think this is an operation that will be needed more prominently. Defining a 
> point where one long logical program is broken into different executions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to