[ 
https://issues.apache.org/jira/browse/FLINK-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14627805#comment-14627805
 ] 

ASF GitHub Bot commented on FLINK-2363:
---------------------------------------

Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/913#discussion_r34662306
  
    --- Diff: docs/internals/through_stack.md ---
    @@ -0,0 +1,181 @@
    +---
    +title:  "From Program to Result: A Dive through Stack and Execution"
    +---
    +<!--
    +Licensed to the Apache Software Foundation (ASF) under one
    +or more contributor license agreements.  See the NOTICE file
    +distributed with this work for additional information
    +regarding copyright ownership.  The ASF licenses this file
    +to you under the Apache License, Version 2.0 (the
    +"License"); you may not use this file except in compliance
    +with the License.  You may obtain a copy of the License at
    +
    +  http://www.apache.org/licenses/LICENSE-2.0
    +
    +Unless required by applicable law or agreed to in writing,
    +software distributed under the License is distributed on an
    +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    +KIND, either express or implied.  See the License for the
    +specific language governing permissions and limitations
    +under the License.
    +-->
    +
    +This page explains what happens when you execute a Flink program 
(streaming or streaming).
    +It covers the complete life cycle, from <emph>client-side 
pre-flight</emph>, to <emph>JobManager</emph>,
    +and <emph>TaskManager</emph>.
    +
    +Please refer to the [Overview Page](../index.html) for a high-level 
overview of the processes and the stack.
    +
    +* This will be replaced by the TOC
    +{:toc}
    +
    +
    +## Batch API: DataSets to JobGraph
    +
    +## Streaming API: DataStreams to JobGraph
    +
    +## Client: Submitting a Job, Receiving Statistics & Results
    +
    +
    +
    +## JobManager: Receiving and Starting a Job
    +
    +parallel operator instance
    +
    +execution attempts
    +
    +
    +## TaskManager: Running Tasks
    +
    +The *{% gh_link 
flink-runtime/src/main/scala/org/apache/flink/runtime/taskmanager/TaskManager.scala
 "TaskManager" %}*
    +is the worker process in Flink. It runs the *{% gh_link 
flink-runtime/src/main/java/org/apache/flink/runtime/taskmanager/Task.java 
"Tasks" %}*,
    +which execute a parallel operator instance.
    +
    +The TaskManager itself is an Akka actor and has many utility components, 
such as:
    +
    + - The *{% gh_link 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/NetworkEnvironment.java
 "NetworkEnvironment" %}*, which takes care of all data exchanges (streamed and 
batched) between TaskManagers. Cached data sets in Flink are also cached 
streams, to the network environment is responsible for that as well.
    +
    + - The *{% gh_link 
flink-runtime/src/main/java/org/apache/flink/runtime/memorymanager/MemoryManager.java
 "MemoryManager" %}*, which governs the memory for sorting, hashing, and 
in-operator data caching.
    +
    + - The *{% gh_link 
flink-runtime/src/main/java/org/apache/flink/runtime/io/disk/iomanager/IOManager.java
 "I/O Manager" %}*, which governs the memory for sorting, hashing, and 
in-operator data caching.
    +
    + - The *{% gh_link 
flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/LibraryCacheManager.java
 "Library Cache" %}*, which gives access to JAR files needed by tasks.
    +
    +When the TaskManager runs tasks, it does not know anything about the 
task's role in the dataflow. The TaskManager only knows the tasks itself and 
streams that the task interacts with.
    +Any form of cross-task coordination must go through the JobManager.
    +
    +The execution of a Task begins when the TaskManager receives the 
*SubmitTask* message. The message contains the
    +*{% gh_link 
flink-runtime/src/main/java/org/apache/flink/runtime/deployment/TaskDeploymentDescriptor.java
 "TaskDeploymentDescriptor" %}*. This descriptor defines everything
    +a task needs to know:
    +
    + - The unique task {% gh_link 
flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/ExecutionAttemptID.java
 "Execution ID" %}.
    + - The name of the executable code class of the task (batch operator, 
stream operator, iterative operator, ...)
    + - The IDs of the streams that the task reads. If not all streams are 
ready yet, some are set to "pending".
    + - The IDs of the streams that the task produces.
    + - The configuration of the executable code. This contains the actual 
operator type (mapper, join, stream window, ...) and the user code, as a 
serialized closure (MapFunction, JoinFunction, ...)
    + - The hashes if the libraries (JAR files) that the code needs.
    + - ...
    +
    +The TaskManager creates the *Task object* from the deployment descriptor, 
registers the task internally under its ID, and spawns a thread that will 
execute the task.
    +After that, the TaskManager is done, and the Task runs by itself, 
reporting its status to the TaskManager as it progresses or fails.
    +
    +The Task executes in the following stages:
    +
    + 1. It takes the hashes of the required JAR files and makes sure these 
files are in the library cache. If they are not yet there, the library cache 
downloads them from the JobManager's BLOB server. This operation may take a 
while, if the JAR files are very large (many libraries). Note that this only 
needs to be done for the first Task of a program on each TaskManager. All 
successive Tasks should find the JAR files cached.
    +
    + 2. It creates a classloader from these JAR files. The classloader is 
called the *user-code classloader* and is used to resolve all classes that can 
be user-defined.
    +
    + 3. It registers its consumed and produces data streams (input and output) 
at the network environment. This reserves resources for the Task by creating 
the buffer pools that are used for data exchange.
    +
    + 4. In case the task uses state of a checkpoint (a streaming task that is 
restarted after a failure), it restores this state. This may involve fetching 
the state from remote storage, depending on where the state was stored.
    +
    + 5. The Task switches to *"RUNNING"* and notifies the TaskManager of that 
progress. The TaskManager in turn sends a {% gh_link 
/flink-runtime/src/main/scala/org/apache/flink/runtime/messages/TaskMessages.scala#L142
 "UpdateTaskExecutionState" %} actor message to the JobManager, to notify it of 
the progress.
    +
    + 6. The Task invokes the executable code that was configured. This code is 
usually generic and only differentiates between coarse classes of operations 
(e.g., {% gh_link 
flink-runtime/src/main/java/org/apache/flink/runtime/operators/RegularPactTask.java
 "Plain Batch Task" %} , {% gh_link 
flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/runtime/tasks/StreamTask.java
 "Streaming Operator" %}, or {% gh_link 
flink-runtime/src/main/java/org/apache/flink/runtime/iterative/task/IterationIntermediatePactTask.java
 "Iterative Batch Task" %}). Internally, these generic operators instantiate 
the specific operator code (e.g., map task, join task, window reducer, ...) and 
the user functions and executes them.
    +
    + 7. If the Task was deployed before all iof its inputs were available 
(early deployment), the Task receives updates on those newly available streams.
    --- End diff --
    
    "iof" -> "of"


> Add an end-to-end overview of program execution in Flink to the docs
> --------------------------------------------------------------------
>
>                 Key: FLINK-2363
>                 URL: https://issues.apache.org/jira/browse/FLINK-2363
>             Project: Flink
>          Issue Type: Improvement
>          Components: Documentation
>            Reporter: Stephan Ewen
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to