[jira] [Commented] (FLINK-2239) print() on DataSet: stream results and print incrementally

Maximilian Michels (JIRA) Fri, 19 Jun 2015 01:51:12 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-2239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14593211#comment-14593211
 ]


Maximilian Michels commented on FLINK-2239:
-------------------------------------------

I was thinking, if we already have working solution, how about improving it to 
check on the client whether we can directly connect to the task managers and, 
otherwise, fall back to the current solution?

IMHO the best solution is to stream data from the task managers directly. 
Having the job manager as a proxy is not an optimal solution. 

> print() on DataSet: stream results and print incrementally
> ----------------------------------------------------------
>
>                 Key: FLINK-2239
>                 URL: https://issues.apache.org/jira/browse/FLINK-2239
>             Project: Flink
>          Issue Type: Improvement
>          Components: Distributed Runtime
>    Affects Versions: 0.9
>            Reporter: Maximilian Michels
>             Fix For: 0.10
>
>
> Users find it counter-intuitive that {{print()}} on a DataSet internally 
> calls {{collect()}} and fully materializes the set. This leads to out of 
> memory errors on the client. It also leaves users with the feeling that Flink 
> cannot handle large amount of data and that it fails frequently.
> To improve on this situation requires some major architectural changes in 
> Flink. The easiest solution would probably be to transfer the data from the 
> job manager to the client via the {{BlobManager}}. Alternatively, the client 
> could directly connect to the task managers and fetch the results. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2239) print() on DataSet: stream results and print incrementally

Reply via email to