[ 
https://issues.apache.org/jira/browse/FLINK-2239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14601159#comment-14601159
 ] 

Stephan Ewen commented on FLINK-2239:
-------------------------------------

There is pending work to support larger results for {{collect()}} by letting 
them go through the BLOB manager. That is still limited by client memory, 
though.

The concern about direct connections between client and workers is that this 
fails in many enterprise setups due to firewalls. We have seen multiple 
installations with "edge servers". The client can communicate with the master, 
but not the workers.

I like the idea of {{iterate()}}. Would be a bit of an effort, but seems like a 
clean solution.

> print() on DataSet: stream results and print incrementally
> ----------------------------------------------------------
>
>                 Key: FLINK-2239
>                 URL: https://issues.apache.org/jira/browse/FLINK-2239
>             Project: Flink
>          Issue Type: Improvement
>          Components: Distributed Runtime
>    Affects Versions: 0.9
>            Reporter: Maximilian Michels
>             Fix For: 0.10
>
>
> Users find it counter-intuitive that {{print()}} on a DataSet internally 
> calls {{collect()}} and fully materializes the set. This leads to out of 
> memory errors on the client. It also leaves users with the feeling that Flink 
> cannot handle large amount of data and that it fails frequently.
> To improve on this situation requires some major architectural changes in 
> Flink. The easiest solution would probably be to transfer the data from the 
> job manager to the client via the {{BlobManager}}. Alternatively, the client 
> could directly connect to the task managers and fetch the results. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to