[
https://issues.apache.org/jira/browse/FLINK-2239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14593295#comment-14593295
]
Till Rohrmann commented on FLINK-2239:
--------------------------------------
I agree.
> print() on DataSet: stream results and print incrementally
> ----------------------------------------------------------
>
> Key: FLINK-2239
> URL: https://issues.apache.org/jira/browse/FLINK-2239
> Project: Flink
> Issue Type: Improvement
> Components: Distributed Runtime
> Affects Versions: 0.9
> Reporter: Maximilian Michels
> Fix For: 0.10
>
>
> Users find it counter-intuitive that {{print()}} on a DataSet internally
> calls {{collect()}} and fully materializes the set. This leads to out of
> memory errors on the client. It also leaves users with the feeling that Flink
> cannot handle large amount of data and that it fails frequently.
> To improve on this situation requires some major architectural changes in
> Flink. The easiest solution would probably be to transfer the data from the
> job manager to the client via the {{BlobManager}}. Alternatively, the client
> could directly connect to the task managers and fetch the results.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)