[GitHub] flink pull request: [FLINK-1730]Persist operator on Data Sets

fhueske Wed, 09 Sep 2015 03:02:37 -0700

Github user fhueske commented on the pull request:

    https://github.com/apache/flink/pull/1083#issuecomment-138861596
  
    You are certainly right that there should be an API call to explicitly 
persist data in memory (or transparently on disk if memory is short) and later 
access this data (within the same or another job). However, this feature can be 
implemented in different ways, for example using the network stack or on the 
operator level. Even if one implementation looks straight-forward, it can have 
severe limitations and implications on the behavior of the system. That is why 
such features should be discussed before taking action even if it looks like an 
easily doable thing.
    
    Doing it on an operator level has several shortcomings:
    - persisted data sets cannot be used for recovery. If done on the network 
stack level, the same code can be basically used for both.
    - data cannot (easily) be shared across jobs. Operators are expected to 
return their memory when a job is done otherwise this will be a memory leak. 
There is no way to free memory if the job is finished and did not do it.
    
    @uce, @StephanEwen You are more familiar with this feature. Did I miss 
something?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-1730]Persist operator on Data Sets

Reply via email to