[jira] [Commented] (FLINK-2239) print() on DataSet: stream results and print incrementally

Stephan Ewen (JIRA) Fri, 19 Jun 2015 00:44:38 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-2239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14593150#comment-14593150
 ]


Stephan Ewen commented on FLINK-2239:
-------------------------------------

There is code in contrib that collects data from a stream. It works if the 
setup allows the client to connect to the task managers and if the ports that 
the client assumes work for the taskmanagers.

It is not a totally robust solution (that why it is not on the data stream 
directly), but may be a point to start.

> print() on DataSet: stream results and print incrementally
> ----------------------------------------------------------
>
>                 Key: FLINK-2239
>                 URL: https://issues.apache.org/jira/browse/FLINK-2239
>             Project: Flink
>          Issue Type: Improvement
>          Components: Distributed Runtime
>    Affects Versions: 0.9
>            Reporter: Maximilian Michels
>             Fix For: 0.10
>
>
> Users find it counter-intuitive that {{print()}} on a DataSet internally 
> calls {{collect()}} and fully materializes the set. This leads to out of 
> memory errors on the client. It also leaves users with the feeling that Flink 
> cannot handle large amount of data and that it fails frequently.
> To improve on this situation requires some major architectural changes in 
> Flink. The easiest solution would probably be to transfer the data from the 
> job manager to the client via the {{BlobManager}}. Alternatively, the client 
> could directly connect to the task managers and fetch the results. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2239) print() on DataSet: stream results and print incrementally

Reply via email to