[
https://issues.apache.org/jira/browse/FLINK-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14230155#comment-14230155
]
ASF GitHub Bot commented on FLINK-1157:
---------------------------------------
Github user uce commented on a diff in the pull request:
https://github.com/apache/incubator-flink/pull/246#discussion_r21106379
--- Diff: docs/config.md ---
@@ -266,3 +272,79 @@ So if `yarn.am.rpc.port` is configured to `10245` and
the session's application
- `yarn.am.rpc.port`: The port that is being opened by the Application
Master (AM) to
let the YARN client connect for an RPC serice. (DEFAULT: Port 10245)
+
+
+## Background
+
+### Configuring the Network Buffers
+
+Network buffers are a critical resource for the communication layers. They
are
+used to buffer records before transmission over a network, and to buffer
+incoming data before dissecting it into records and handing them to the
+application. A sufficient number of network buffers are critical to
achieve a
+good throughput.
+
+In general, configure the task manager to have so many buffers that each
logical
+network connection on you expect to be open at the same time has a
dedicated
+buffer. A logical network connection exists for each point-to-point
exchange of
+data over the network, which typically happens at repartitioning- or
+broadcasting steps. In those, each parallel task inside the TaskManager
has to
+be able to talk to all other parallel tasks. Hence, the required number of
+buffers on a task manager is *total-degree-of-parallelism* (number of
targets)
+\* *intra-node-parallelism* (number of sources in one task manager) \* *n*.
+Here, *n* is a constant that defines how many repartitioning-/broadcasting
steps
+you expect to be active at the same time.
+
+Since the *intra-node-parallelism* is typically the number of cores, and
more
+than 4 repartitioning or broadcasting channels are rarely active in
parallel, it
+frequently boils down to *\#cores\^2\^* \* *\#machines* \* 4. To support
for
+example a cluster of 20 8-core machines, you should use roughly 5000
network
+buffers for optimal throughput.
+
+Each network buffer is by default 64 KiBytes large. In the above example,
the
+system would allocate roughly 300 MiBytes for network buffers.
+
+The number and size of network buffers can be configured with the following
+parameters:
+
+- `taskmanager.network.numberOfBuffers`, and
+- `taskmanager.network.bufferSizeInBytes`.
+
+### Configuring Temporary I/O Directories
+
+Although Flink aims to process as much data in main memory as possible,
+it is not uncommon that more data needs to be processed than memory is
+available. Flink's runtime is designed to write temporary data to disk
+to handle these situations.
+
+The `taskmanager.tmp.dirs` parameter specifies a list of directories into
which
+Flink writes temporary files. The paths of the directories need to be
+separated by ':' (colon character). Flink will concurrently write (or
+read) one temporary file to (from) each configured directory. This way,
+temporary I/O can be evenly distributed over multiple independent I/O
devices
+such as hard disks to improve performance. To leverage fast I/O devices
(e.g.,
+SSD, RAID, NAS), it is possible to specify a directory multiple times.
+
+If the `taskmanager.tmp.dirs` parameter is not explicitly specified,
+Flink writes temporary data to the temporary directory of the operating
+system, such as */tmp* in Linux systems.
+
+
+### Configuring TaskManager processing slots
+
+A processing slot allows Flink to execute a distributed DataSet
transformation, such as a
+data source or a map-transformation.
+
+Each Flink TaskManager provides processing slots in the cluster. The
number of slots
+is typically proportional to the number of available CPU cores __of each__
TaskManager.
+As a general recommendation, the number of available CPU cores is a good
default for
+`taskmanager.numberOfTaskSlots`.
+
+When starting a Flink application, users can supply the default number of
slots to use for that job.
+The command line value therefore is called `-p` (for parallelism). In
addition, it is possible
+to [set the number of slots in the programming
APIs](programming_guide.html#parallel-execution) for
+the whole application and individual operators.
+
+Flink is currently scheduling an application to slots by "filling" them
up.
+If the cluster has 20 machines with 2 slots each (40 slots in total) but
the application is running
+with a parallelism of 20, only 10 machines are processing data.
--- End diff --
are processing => will process
> Document TaskManager Slots
> --------------------------
>
> Key: FLINK-1157
> URL: https://issues.apache.org/jira/browse/FLINK-1157
> Project: Flink
> Issue Type: Improvement
> Components: Documentation
> Reporter: Robert Metzger
> Assignee: Robert Metzger
>
> Slots are not explained in the documentation.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)