[
https://issues.apache.org/jira/browse/FLINK-2239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14593150#comment-14593150
]
Stephan Ewen commented on FLINK-2239:
-------------------------------------
There is code in contrib that collects data from a stream. It works if the
setup allows the client to connect to the task managers and if the ports that
the client assumes work for the taskmanagers.
It is not a totally robust solution (that why it is not on the data stream
directly), but may be a point to start.
> print() on DataSet: stream results and print incrementally
> ----------------------------------------------------------
>
> Key: FLINK-2239
> URL: https://issues.apache.org/jira/browse/FLINK-2239
> Project: Flink
> Issue Type: Improvement
> Components: Distributed Runtime
> Affects Versions: 0.9
> Reporter: Maximilian Michels
> Fix For: 0.10
>
>
> Users find it counter-intuitive that {{print()}} on a DataSet internally
> calls {{collect()}} and fully materializes the set. This leads to out of
> memory errors on the client. It also leaves users with the feeling that Flink
> cannot handle large amount of data and that it fails frequently.
> To improve on this situation requires some major architectural changes in
> Flink. The easiest solution would probably be to transfer the data from the
> job manager to the client via the {{BlobManager}}. Alternatively, the client
> could directly connect to the task managers and fetch the results.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)