[
https://issues.apache.org/jira/browse/FLINK-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711113#comment-14711113
]
Sachin Goel commented on FLINK-1730:
------------------------------------
Yes. Going through the entire pact task logic, I observed all of that. I was
almost surprised how well it could support this functionality.
One of the ideas I have is to implement two specific gates: One for input,
which resides directly on memory manager, and an output gate, whose output is
written to the memory, and not transferred over network.
This way, the Pack task can just create one of these two gates and add to the
existing gates, depending on whether the results are available in the cache or
not. After that, it's just a matter of initializing the {{NoOpDriver}}.
Further, although I'm not sure about it, the memory manager itself can spill
data to disk if needed, right? That way, it's not required at all to implement
something in-memory-cum-disk. It's already there.
The relevant storage on the memory manager will have locks based on task name
and indexes, so that the cache is not cleared out until the accessing tasks
have finished reading it. And we could perhaps follow a LRU scheme for clearing
out the cache storage.
> Add a FlinkTools.persist style method to the Data Set.
> ------------------------------------------------------
>
> Key: FLINK-1730
> URL: https://issues.apache.org/jira/browse/FLINK-1730
> Project: Flink
> Issue Type: New Feature
> Reporter: Stephan Ewen
> Priority: Minor
>
> I think this is an operation that will be needed more prominently. Defining a
> point where one long logical program is broken into different executions.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)