Re: API behavior with data sinks (lazy) and eager operations

Robert Metzger Mon, 19 Jan 2015 01:59:55 -0800

I would also execute the sinks immediately. I think its a corner case
because the sinks are usually the last thing in a plan and all print() or
collect() statements are earlier in the plan.


print() should go to the client command line, yes.

On Mon, Jan 19, 2015 at 1:42 AM, Stephan Ewen <se...@apache.org> wrote:

> Hi there!
>
> With the upcoming more interactive extensions to the API (operations that
> go back to the client from a program and need to be eagerly evaluated) we
> need to define how different actions should behave.
>
> Currently, nothing gets executed until the "env.execute()" call is made.
> That allows to produce multiple data sources at the same time, which is a
> good feature.
>
> For certain operations, like the "count()" and "collect()" functions added
> in https://github.com/apache/flink/pull/210 , we need to trigger execution
> immediately.
>
> The open question is, how should this behave in connection with already
> defined data sinks:
>
> 1) Should all yet defined data sinks be executed as well?
> 2) Should only that immediate operation be executed and the data sinks be
> pending till a call to "env.execute()"
>
> I am somewhat leaning towards the first option right now, because I think
> that executing them later may force re-execution of larger parts of the
> plan.
>
> In addition: I think that the "print()" commands should go to the client
> command line. In that sense, they would behave like
> "collect().foreach(print)"
>
>
> Greetings,
> Stephan
>

Re: API behavior with data sinks (lazy) and eager operations

Reply via email to