I think this is the 3rd discussion about this ;-) AFAIK, the consensus in previous discussions was to do it exactly like collect() and print to the client.
The only open question was how do we deal with the break in the API. Right now, the programs contain a "execute()" call after the print(), which would then throw an exception because there is nothing to be executed that was not already part of the print(). On Tue, Apr 28, 2015 at 10:18 AM, Aljoscha Krettek <aljos...@apache.org> wrote: > Hi Folks, > right now .print() on DataSet creates a DataSink that prints to the > local stdout of a TaskManager. This is not very helpful when running > in a distributed environment, especially when using something like an > interactive Scala Shell in a cluster. > > I propose to change print() to use collect() internally and therefore > eagerly execute without requiring env.execute(). > > What do you think? > > Aljoscha >