I would also execute the sinks immediately. I think its a corner case because the sinks are usually the last thing in a plan and all print() or collect() statements are earlier in the plan.
print() should go to the client command line, yes. On Mon, Jan 19, 2015 at 1:42 AM, Stephan Ewen <se...@apache.org> wrote: > Hi there! > > With the upcoming more interactive extensions to the API (operations that > go back to the client from a program and need to be eagerly evaluated) we > need to define how different actions should behave. > > Currently, nothing gets executed until the "env.execute()" call is made. > That allows to produce multiple data sources at the same time, which is a > good feature. > > For certain operations, like the "count()" and "collect()" functions added > in https://github.com/apache/flink/pull/210 , we need to trigger execution > immediately. > > The open question is, how should this behave in connection with already > defined data sinks: > > 1) Should all yet defined data sinks be executed as well? > 2) Should only that immediate operation be executed and the data sinks be > pending till a call to "env.execute()" > > I am somewhat leaning towards the first option right now, because I think > that executing them later may force re-execution of larger parts of the > plan. > > In addition: I think that the "print()" commands should go to the client > command line. In that sense, they would behave like > "collect().foreach(print)" > > > Greetings, > Stephan >