[jira] [Commented] (FLINK-2092) Document (new) behavior of print() and execute()

ASF GitHub Bot (JIRA) Thu, 04 Jun 2015 02:02:14 -0700

    [ 
https://issues.apache.org/jira/browse/FLINK-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14572402#comment-14572402
 ]


ASF GitHub Bot commented on FLINK-2092:
---------------------------------------

Github user uce commented on a diff in the pull request:

    https://github.com/apache/flink/pull/774#discussion_r31703637
  
    --- Diff: docs/apis/programming_guide.md ---
    @@ -394,26 +382,66 @@ def write(outputFormat: FileOutputFormat[T],
         writeMode: WriteMode = WriteMode.NO_OVERWRITE)
     
     def print()
    -{% endhighlight %}
     
    -The last method is only useful for developing/debugging on a local machine,
    -it will output the contents of the DataSet to standard output. (Note that 
in
    -a cluster, the result goes to the standard out stream of the cluster nodes 
and ends
    -up in the *.out* files of the workers).
    -The first two do as the name suggests, the third one can be used to 
specify a
    -custom data output format. Please refer
    -to [Data Sinks](#data-sinks) for more information on writing to files and 
also
    -about custom data output formats.
    -
    -Once you specified the complete program you need to call `execute` on
    -the `ExecutionEnvironment`. This will either execute on your local
    -machine or submit your program for execution on a cluster, depending on
    -how you created the execution environment.
    +def collect()
    +{% endhighlight %}
     
     </div>
     </div>
     
     
    +The first two methods (`writeAsText()` and `writeAsCsv()`) do as the name 
suggests, the third one 
    +can be used to specify a custom data output format. Please refer to [Data 
Sinks](#data-sinks) for 
    +more information on writing to files and also about custom data output 
formats.
    +
    +The `print()` method is useful for developing/debugging. It will output 
the contents of the DataSet 
    +to standard output (on the JVM starting the Flink execution). **NOTE** The 
behavior of the `print()`
    +method changed with Flink 0.9.x. Before it was printing to the log file of 
the workers, now its 
    +sending the DataSet results to the client and printing the results there.
    +
    +`collect()` allows to retrieve the DataSet from the cluster to the local 
JVM. The `collect()` method 
    +will return a `List` containing the elements.
    +
    +Both `print()` and `collect()` will trigger the execution of the program.
    +
    +
    +**NOTE** `print()` and `collect()` retrieve the data from the cluster to 
the client. Currently,
    +the data sizes you can retrieve with `collect()` are limited due to our 
RPC system. It is not advised
    +to collect DataSets larger than 10MBs.
    +
    +
    +Once you specified the complete program you need to **trigger the program 
execution**. You can call
    +`execute()` directly on the `ExecutionEnviroment` or you implicitly 
trigger the execution with
    +`collect()` or `print()`.
    +Depending on the type of the `ExecutionEnvironment` the execution will be 
triggered on your local 
    +machine or submit your program for execution on a cluster.
    +
    +Note that you can not call both `print()` (or `collect()`) and `execute()` 
at the end of program.
    +
    +The `execute()` method is returning the `JobExecutionResult`, including 
execution times and
    +accumulator results. `print()` and `collect()` are not returning the 
result, but it can be
    +accessed from the `getLastJobExecutionResult()` method.
    +
    +
    +[Back to top](#top)
    +
    +
    +DataSet abstraction
    +---------------
    +
    +The batch processing APIs of Flink are centered around the `DataSet` 
abstraction. A `DataSet` is only
    +an abstract representation of a set of data that can contain duplicates.
    +
    +Also note that Flink is not always physically creating (materializing) 
each DataSet at runtime. This 
    +depends on the used runtime, the configuration and optimizer decisions.
    +
    +The Flink runtime is usually not materializing the DataSets because it is 
using a streaming runtime model.
    --- End diff --
    
    We could formulate this more positive: The Flink runtime does not need to 
always materialize... 


> Document (new) behavior of print() and execute()
> ------------------------------------------------
>
>                 Key: FLINK-2092
>                 URL: https://issues.apache.org/jira/browse/FLINK-2092
>             Project: Flink
>          Issue Type: Task
>          Components: Documentation
>    Affects Versions: 0.9
>            Reporter: Robert Metzger
>            Assignee: Robert Metzger
>            Priority: Blocker
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2092) Document (new) behavior of print() and execute()

Reply via email to