[jira] [Commented] (FLINK-2239) print() on DataSet: stream results and print incrementally

Till Rohrmann (JIRA) Fri, 19 Jun 2015 01:05:42 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-2239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14593166#comment-14593166
 ]


Till Rohrmann commented on FLINK-2239:
--------------------------------------

Yes Johannes worked on using the {{BlobManager}} for the {{collect}} method. 
However, he stopped working on it, because the access to the {{BlobManager}} 
was too complicated with our current design. He made some general suggestions 
how to improve the situation which I'll share with you on the mailing list.

> print() on DataSet: stream results and print incrementally
> ----------------------------------------------------------
>
>                 Key: FLINK-2239
>                 URL: https://issues.apache.org/jira/browse/FLINK-2239
>             Project: Flink
>          Issue Type: Improvement
>          Components: Distributed Runtime
>    Affects Versions: 0.9
>            Reporter: Maximilian Michels
>             Fix For: 0.10
>
>
> Users find it counter-intuitive that {{print()}} on a DataSet internally 
> calls {{collect()}} and fully materializes the set. This leads to out of 
> memory errors on the client. It also leaves users with the feeling that Flink 
> cannot handle large amount of data and that it fails frequently.
> To improve on this situation requires some major architectural changes in 
> Flink. The easiest solution would probably be to transfer the data from the 
> job manager to the client via the {{BlobManager}}. Alternatively, the client 
> could directly connect to the task managers and fetch the results. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2239) print() on DataSet: stream results and print incrementally

Reply via email to