[
https://issues.apache.org/jira/browse/FLINK-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572402#comment-14572402
]
ASF GitHub Bot commented on FLINK-2092:
---------------------------------------
Github user uce commented on a diff in the pull request:
https://github.com/apache/flink/pull/774#discussion_r31703637
--- Diff: docs/apis/programming_guide.md ---
@@ -394,26 +382,66 @@ def write(outputFormat: FileOutputFormat[T],
writeMode: WriteMode = WriteMode.NO_OVERWRITE)
def print()
-{% endhighlight %}
-The last method is only useful for developing/debugging on a local machine,
-it will output the contents of the DataSet to standard output. (Note that
in
-a cluster, the result goes to the standard out stream of the cluster nodes
and ends
-up in the *.out* files of the workers).
-The first two do as the name suggests, the third one can be used to
specify a
-custom data output format. Please refer
-to [Data Sinks](#data-sinks) for more information on writing to files and
also
-about custom data output formats.
-
-Once you specified the complete program you need to call `execute` on
-the `ExecutionEnvironment`. This will either execute on your local
-machine or submit your program for execution on a cluster, depending on
-how you created the execution environment.
+def collect()
+{% endhighlight %}
</div>
</div>
+The first two methods (`writeAsText()` and `writeAsCsv()`) do as the name
suggests, the third one
+can be used to specify a custom data output format. Please refer to [Data
Sinks](#data-sinks) for
+more information on writing to files and also about custom data output
formats.
+
+The `print()` method is useful for developing/debugging. It will output
the contents of the DataSet
+to standard output (on the JVM starting the Flink execution). **NOTE** The
behavior of the `print()`
+method changed with Flink 0.9.x. Before it was printing to the log file of
the workers, now its
+sending the DataSet results to the client and printing the results there.
+
+`collect()` allows to retrieve the DataSet from the cluster to the local
JVM. The `collect()` method
+will return a `List` containing the elements.
+
+Both `print()` and `collect()` will trigger the execution of the program.
+
+
+**NOTE** `print()` and `collect()` retrieve the data from the cluster to
the client. Currently,
+the data sizes you can retrieve with `collect()` are limited due to our
RPC system. It is not advised
+to collect DataSets larger than 10MBs.
+
+
+Once you specified the complete program you need to **trigger the program
execution**. You can call
+`execute()` directly on the `ExecutionEnviroment` or you implicitly
trigger the execution with
+`collect()` or `print()`.
+Depending on the type of the `ExecutionEnvironment` the execution will be
triggered on your local
+machine or submit your program for execution on a cluster.
+
+Note that you can not call both `print()` (or `collect()`) and `execute()`
at the end of program.
+
+The `execute()` method is returning the `JobExecutionResult`, including
execution times and
+accumulator results. `print()` and `collect()` are not returning the
result, but it can be
+accessed from the `getLastJobExecutionResult()` method.
+
+
+[Back to top](#top)
+
+
+DataSet abstraction
+---------------
+
+The batch processing APIs of Flink are centered around the `DataSet`
abstraction. A `DataSet` is only
+an abstract representation of a set of data that can contain duplicates.
+
+Also note that Flink is not always physically creating (materializing)
each DataSet at runtime. This
+depends on the used runtime, the configuration and optimizer decisions.
+
+The Flink runtime is usually not materializing the DataSets because it is
using a streaming runtime model.
--- End diff --
We could formulate this more positive: The Flink runtime does not need to
always materialize...
> Document (new) behavior of print() and execute()
> ------------------------------------------------
>
> Key: FLINK-2092
> URL: https://issues.apache.org/jira/browse/FLINK-2092
> Project: Flink
> Issue Type: Task
> Components: Documentation
> Affects Versions: 0.9
> Reporter: Robert Metzger
> Assignee: Robert Metzger
> Priority: Blocker
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)