[GitHub] [flink] pnowojski commented on a change in pull request #16988: [FLINK-23458][docs] Added the network buffer documentation along wit…

GitBox Tue, 31 Aug 2021 01:38:13 -0700


pnowojski commented on a change in pull request #16988:
URL: https://github.com/apache/flink/pull/16988#discussion_r699022660




##########
File path: docs/content/docs/deployment/network_buffer.md
##########
@@ -0,0 +1,125 @@
+---
+title: "Network Tuning"
+weight: 100
+type: docs
+aliases:
+  - /deployment/network_buffer.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Network buffer
+
+## Overview
+
+Each record in flink is sent to the next subtask not individually but 
compounded in Network buffer,
+the smallest unit for communication between subtasks. Also, in order to keep 
consistent high throughput,
+Flink uses the network buffer queues (so called in-flight data) both on the 
output as well as on the input side. 
+In the result each subtask have an input queue waiting for the consumption and 
an output queue
+waiting for sending to the next subtask. Having a larger amount of the 
in-flight data means Flink can provide a
+higher throughput that's more resilient to small hiccups in the pipeline but 
it has negative effect for the
+checkpoint time. The long checkpoint time issue can be caused by many things, 
one of those is checkpoint barriers

Review comment:
       nit: brake a paragraph before "The long checkpoint time"?

##########
File path: docs/content/docs/deployment/network_buffer.md
##########
@@ -0,0 +1,125 @@
+---
+title: "Network Tuning"
+weight: 100
+type: docs
+aliases:
+  - /deployment/network_buffer.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Network buffer
+
+## Overview
+
+Each record in flink is sent to the next subtask not individually but 
compounded in Network buffer,
+the smallest unit for communication between subtasks. Also, in order to keep 
consistent high throughput,
+Flink uses the network buffer queues (so called in-flight data) both on the 
output as well as on the input side. 
+In the result each subtask have an input queue waiting for the consumption and 
an output queue
+waiting for sending to the next subtask. Having a larger amount of the 
in-flight data means Flink can provide a
+higher throughput that's more resilient to small hiccups in the pipeline but 
it has negative effect for the
+checkpoint time. The long checkpoint time issue can be caused by many things, 
one of those is checkpoint barriers
+propagation time. Checkpoint in Flink can finish only once all subtask 
receives all injected checkpoint
+barriers. In aligned checkpoints([see]{{< ref 
"docs/concepts/stateful-stream-processing" >}}#checkpointing)
+those checkpoint barriers are traveling throughout the job graph long along
+the network buffers and the larger amount of in-flight data the longer the 
checkpoint barrier propagation
+time. In unaligned checkpoints(([see]{{< ref 
"docs/concepts/stateful-stream-processing" >}}#unaligned-checkpointing)) on the 
other hand, the more in-flight data, the larger the checkpoint size as
+all of the captured in-flight data has to be persisted as part of the 
checkpoint.
+
+## Buffer debloat
+
+Historically the only way to configure the amount of in-flight data was to 
specify both amount and the size
+of the buffers. However ideal values for those numbers are hard to pick, as 
they are different for every
+deployment. The buffer debloating mechanism added in Flink 1.14 attempts to 
address this issue.
+It tries to automatically adjust the amount of in-flight data in order to a 
reasonable values.
+More precisely, the buffer debloating calculate the maximum possible throughput
+(the maximum throughput which would be if the subtask was always busy)
+for the subtask and adjusts the amount of in-flight data in such a way that 
the time for consumption of those in-flight data will be equal to the 
configured value.
+
+The most useful settings:

Review comment:
       ```suggestion
   The most important settings:
   ```

##########
File path: docs/content/docs/deployment/network_buffer.md
##########
@@ -0,0 +1,125 @@
+---
+title: "Network Tuning"
+weight: 100
+type: docs
+aliases:
+  - /deployment/network_buffer.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Network buffer
+
+## Overview
+
+Each record in flink is sent to the next subtask not individually but 
compounded in Network buffer,
+the smallest unit for communication between subtasks. Also, in order to keep 
consistent high throughput,
+Flink uses the network buffer queues (so called in-flight data) both on the 
output as well as on the input side. 
+In the result each subtask have an input queue waiting for the consumption and 
an output queue
+waiting for sending to the next subtask. Having a larger amount of the 
in-flight data means Flink can provide a
+higher throughput that's more resilient to small hiccups in the pipeline but 
it has negative effect for the
+checkpoint time. The long checkpoint time issue can be caused by many things, 
one of those is checkpoint barriers
+propagation time. Checkpoint in Flink can finish only once all subtask 
receives all injected checkpoint
+barriers. In aligned checkpoints([see]{{< ref 
"docs/concepts/stateful-stream-processing" >}}#checkpointing)
+those checkpoint barriers are traveling throughout the job graph long along
+the network buffers and the larger amount of in-flight data the longer the 
checkpoint barrier propagation
+time. In unaligned checkpoints(([see]{{< ref 
"docs/concepts/stateful-stream-processing" >}}#unaligned-checkpointing)) on the 
other hand, the more in-flight data, the larger the checkpoint size as
+all of the captured in-flight data has to be persisted as part of the 
checkpoint.
+
+## Buffer debloat
+
+Historically the only way to configure the amount of in-flight data was to 
specify both amount and the size
+of the buffers. However ideal values for those numbers are hard to pick, as 
they are different for every
+deployment. The buffer debloating mechanism added in Flink 1.14 attempts to 
address this issue.
+It tries to automatically adjust the amount of in-flight data in order to a 
reasonable values.
+More precisely, the buffer debloating calculate the maximum possible throughput
+(the maximum throughput which would be if the subtask was always busy)
+for the subtask and adjusts the amount of in-flight data in such a way that 
the time for consumption of those in-flight data will be equal to the 
configured value.
+
+The most useful settings:
+* The buffer debloat can be enabled by setting the property 
`taskmanager.network.memory.buffer-debloat.enabled` to `true`. 
+* Desirable time of the consumption in-flight data can be configured by 
setting `taskmanager.network.memory.buffer-debloat.target` to `duration`.

Review comment:
       ```suggestion
   * The targeted time to consume the in-flight data can be configured by 
setting `taskmanager.network.memory.buffer-debloat.target` to `duration`.
   ```

##########
File path: docs/content/docs/deployment/network_buffer.md
##########
@@ -0,0 +1,125 @@
+---
+title: "Network Tuning"
+weight: 100
+type: docs
+aliases:
+  - /deployment/network_buffer.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Network buffer
+
+## Overview
+
+Each record in flink is sent to the next subtask not individually but 
compounded in Network buffer,
+the smallest unit for communication between subtasks. Also, in order to keep 
consistent high throughput,
+Flink uses the network buffer queues (so called in-flight data) both on the 
output as well as on the input side. 
+In the result each subtask have an input queue waiting for the consumption and 
an output queue
+waiting for sending to the next subtask. Having a larger amount of the 
in-flight data means Flink can provide a
+higher throughput that's more resilient to small hiccups in the pipeline but 
it has negative effect for the
+checkpoint time. The long checkpoint time issue can be caused by many things, 
one of those is checkpoint barriers
+propagation time. Checkpoint in Flink can finish only once all subtask 
receives all injected checkpoint
+barriers. In aligned checkpoints([see]{{< ref 
"docs/concepts/stateful-stream-processing" >}}#checkpointing)
+those checkpoint barriers are traveling throughout the job graph long along
+the network buffers and the larger amount of in-flight data the longer the 
checkpoint barrier propagation
+time. In unaligned checkpoints(([see]{{< ref 
"docs/concepts/stateful-stream-processing" >}}#unaligned-checkpointing)) on the 
other hand, the more in-flight data, the larger the checkpoint size as
+all of the captured in-flight data has to be persisted as part of the 
checkpoint.
+
+## Buffer debloat
+
+Historically the only way to configure the amount of in-flight data was to 
specify both amount and the size
+of the buffers. However ideal values for those numbers are hard to pick, as 
they are different for every
+deployment. The buffer debloating mechanism added in Flink 1.14 attempts to 
address this issue.
+It tries to automatically adjust the amount of in-flight data in order to a 
reasonable values.
+More precisely, the buffer debloating calculate the maximum possible throughput
+(the maximum throughput which would be if the subtask was always busy)
+for the subtask and adjusts the amount of in-flight data in such a way that 
the time for consumption of those in-flight data will be equal to the 
configured value.
+
+The most useful settings:
+* The buffer debloat can be enabled by setting the property 
`taskmanager.network.memory.buffer-debloat.enabled` to `true`. 
+* Desirable time of the consumption in-flight data can be configured by 
setting `taskmanager.network.memory.buffer-debloat.target` to `duration`.
+  The default value of the debloat target should be good enough in most cases.
+
+Buffer debloating in Flink works by measuring past throguhput to predict 
future time to consume the remaining
+in-flight data. If those predictions are incorrect, the debloating mechanism 
can fail in one of the two ways:
+* There won't be enough buffered data to provide full throughput
+* There will be too many buffered in-flight data and the aligned checkpoint 
barriers propagation time or the unaligned checkpoint size will suffer.
+
+Hence, if you have a varying load in your job, for example a sudden spikes of 
incoming records, or periodically
+firing windowed aggregations or joins, you might need to adjust the following 
settings:
+
+* `taskmanager.network.memory.buffer-debloat.period` - the minimum time 
between buffer size recalculation.
+The shorter the period, the faster reaction time of the debloating mechanism, 
but a higher CPU overhead for the necessary calculations.
+* `taskmanager.network.memory.buffer-debloat.samples` - Adjust the number of 
samples over which throughput measurements are averaged out.
+The frequency of the collected samples can be adjusted via 
`taskmanager.network.memory.buffer-debloat.period`.
+The fewer samples, the faster reaction time of the debloating mechanism, but a 
higher chance of a sudden spike or drop of the throughput to cause the buffer 
debloating to miscalculate the best amount of the in-flight data.
+* `taskmanager.network.memory.buffer-debloat.threshold-percentages` - The 
optimization which prevents 
+the frequent buffer size change if the new size is not so different compared 
to the old one.
+
+See the [Configuration]({{< ref "docs/deployment/config" 
>}}#full-taskmanageroptions) documentation for details and additional 
parameters.
+
+The metrics([see]({{< ref "docs/ops/metrics" >}}#io)) which can help to 
observe the current buffer size:
+* `estimatedTimeToConsumerBuffersMs` - the total time to consume data from all 
input channels.
+* `debloatedBufferSize` - the current buffer size.
+
+## Network buffer lifecycle
+Logically, Flink has several local buffer pools one for output stream and one 
for each input gate. 
+Each of that pools is limited to at most 
+```
+#channels * taskmanager.network.memory.buffers-per-channel + 
taskmanager.network.memory.floating-buffers-per-gate
+```
+
+The size of the buffer can be configured by setting 
`taskmanager.memory.segment-size`.
+
+### Input network buffers
+Buffers in the input channel are divided into exclusive and floating buffers.
+Flink attempts to acquire the configured amount of the exclusive buffers in 
the initialization phase for each channel.
+Exclusive buffers can be used only by one particular channel.
+A channel can request additional floating buffers from a buffer pool shared 
across all channels belonging to the given input gate.
+Flink treats the configured amount of exclusive and floating buffers as only a 
recommended values.
+If there are not enough buffers available on the input side, Flink will be 
able to make a progress with zero exclusive buffers and a single floating 
buffer.

Review comment:
       Hmm, maybe this is not correct after all?
   
   
https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/consumer/RemoteInputChannel.java#L160
   
   It looks like ALL of the exclusive buffers are required after all? Only 
floating buffers are recommended?
   
   @wsry can you help us verify how this paragraph should be phrased?

##########
File path: docs/content/docs/deployment/network_buffer.md
##########
@@ -0,0 +1,125 @@
+---
+title: "Network Tuning"
+weight: 100
+type: docs
+aliases:
+  - /deployment/network_buffer.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Network buffer
+
+## Overview
+
+Each record in flink is sent to the next subtask not individually but 
compounded in Network buffer,
+the smallest unit for communication between subtasks. Also, in order to keep 
consistent high throughput,
+Flink uses the network buffer queues (so called in-flight data) both on the 
output as well as on the input side. 
+In the result each subtask have an input queue waiting for the consumption and 
an output queue
+waiting for sending to the next subtask. Having a larger amount of the 
in-flight data means Flink can provide a
+higher throughput that's more resilient to small hiccups in the pipeline but 
it has negative effect for the
+checkpoint time. The long checkpoint time issue can be caused by many things, 
one of those is checkpoint barriers
+propagation time. Checkpoint in Flink can finish only once all subtask 
receives all injected checkpoint
+barriers. In aligned checkpoints([see]{{< ref 
"docs/concepts/stateful-stream-processing" >}}#checkpointing)
+those checkpoint barriers are traveling throughout the job graph long along
+the network buffers and the larger amount of in-flight data the longer the 
checkpoint barrier propagation
+time. In unaligned checkpoints(([see]{{< ref 
"docs/concepts/stateful-stream-processing" >}}#unaligned-checkpointing)) on the 
other hand, the more in-flight data, the larger the checkpoint size as
+all of the captured in-flight data has to be persisted as part of the 
checkpoint.
+
+## Buffer debloat
+
+Historically the only way to configure the amount of in-flight data was to 
specify both amount and the size
+of the buffers. However ideal values for those numbers are hard to pick, as 
they are different for every
+deployment. The buffer debloating mechanism added in Flink 1.14 attempts to 
address this issue.
+It tries to automatically adjust the amount of in-flight data in order to a 
reasonable values.
+More precisely, the buffer debloating calculate the maximum possible throughput
+(the maximum throughput which would be if the subtask was always busy)
+for the subtask and adjusts the amount of in-flight data in such a way that 
the time for consumption of those in-flight data will be equal to the 
configured value.
+
+The most useful settings:
+* The buffer debloat can be enabled by setting the property 
`taskmanager.network.memory.buffer-debloat.enabled` to `true`. 
+* Desirable time of the consumption in-flight data can be configured by 
setting `taskmanager.network.memory.buffer-debloat.target` to `duration`.
+  The default value of the debloat target should be good enough in most cases.
+
+Buffer debloating in Flink works by measuring past throguhput to predict 
future time to consume the remaining
+in-flight data. If those predictions are incorrect, the debloating mechanism 
can fail in one of the two ways:
+* There won't be enough buffered data to provide full throughput
+* There will be too many buffered in-flight data and the aligned checkpoint 
barriers propagation time or the unaligned checkpoint size will suffer.
+
+Hence, if you have a varying load in your job, for example a sudden spikes of 
incoming records, or periodically
+firing windowed aggregations or joins, you might need to adjust the following 
settings:
+
+* `taskmanager.network.memory.buffer-debloat.period` - the minimum time 
between buffer size recalculation.
+The shorter the period, the faster reaction time of the debloating mechanism, 
but a higher CPU overhead for the necessary calculations.
+* `taskmanager.network.memory.buffer-debloat.samples` - Adjust the number of 
samples over which throughput measurements are averaged out.
+The frequency of the collected samples can be adjusted via 
`taskmanager.network.memory.buffer-debloat.period`.
+The fewer samples, the faster reaction time of the debloating mechanism, but a 
higher chance of a sudden spike or drop of the throughput to cause the buffer 
debloating to miscalculate the best amount of the in-flight data.
+* `taskmanager.network.memory.buffer-debloat.threshold-percentages` - The 
optimization which prevents 
+the frequent buffer size change if the new size is not so different compared 
to the old one.
+
+See the [Configuration]({{< ref "docs/deployment/config" 
>}}#full-taskmanageroptions) documentation for details and additional 
parameters.
+
+The metrics([see]({{< ref "docs/ops/metrics" >}}#io)) which can help to 
observe the current buffer size:
+* `estimatedTimeToConsumerBuffersMs` - the total time to consume data from all 
input channels.
+* `debloatedBufferSize` - the current buffer size.
+
+## Network buffer lifecycle
+Logically, Flink has several local buffer pools one for output stream and one 
for each input gate. 
+Each of that pools is limited to at most 
+```
+#channels * taskmanager.network.memory.buffers-per-channel + 
taskmanager.network.memory.floating-buffers-per-gate
+```
+
+The size of the buffer can be configured by setting 
`taskmanager.memory.segment-size`.
+
+### Input network buffers
+Buffers in the input channel are divided into exclusive and floating buffers.
+Flink attempts to acquire the configured amount of the exclusive buffers in 
the initialization phase for each channel.
+Exclusive buffers can be used only by one particular channel.
+A channel can request additional floating buffers from a buffer pool shared 
across all channels belonging to the given input gate.
+Flink treats the configured amount of exclusive and floating buffers as only a 
recommended values.
+If there are not enough buffers available on the input side, Flink will be 
able to make a progress with zero exclusive buffers and a single floating 
buffer.
+
+### Output network buffers
+Unlike the input buffer pool, the output buffers pool have only one type of 
buffers which it shares all among all subpartitions.
+In order to avoid the excessive data skew, the number of buffers for each 
subpartition is limited by `taskmanager.network.memory.max-buffers-per-channel`.
+
+Similarly, as on the input side, the configured amount of the exclusive 
buffers and floating buffers is treated only
+as the recommended values. If there are not enough buffers available, Flink 
will be able to make a progress with
+only a single exclusive buffer per output subpartition and zero floating 
buffers.
+
+## Performance 
+### Selecting the buffer size
+As was described before, the buffer collects records in order to optimize 
network overhead for sending 
+the data portion to the next subtask. The too-small buffer size especially 
less than the one record 
+doesn't make sense because the overhead is still pretty big and anyway the 
next subtask should receive 
+all parts of record before the consuming it(as result it leads to low 
throughput). At the same time, 
+the big buffer size leads to high memory usage, huge checkpoint data(for 
unaligned checkpoint), 
+the big checkpoint time(for aligned checkpoint). Also, the big buffer size 
+doesn't make sense with small `execution.buffer-timeout` because in this case 
all(or almost all) 

Review comment:
       ```suggestion
   do not make sense with small `execution.buffer-timeout` because in this case 
all (or almost all) 
   ```

##########
File path: docs/content/docs/deployment/network_buffer.md
##########
@@ -0,0 +1,125 @@
+---
+title: "Network Tuning"
+weight: 100
+type: docs
+aliases:
+  - /deployment/network_buffer.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Network buffer
+
+## Overview
+
+Each record in flink is sent to the next subtask not individually but 
compounded in Network buffer,
+the smallest unit for communication between subtasks. Also, in order to keep 
consistent high throughput,
+Flink uses the network buffer queues (so called in-flight data) both on the 
output as well as on the input side. 
+In the result each subtask have an input queue waiting for the consumption and 
an output queue
+waiting for sending to the next subtask. Having a larger amount of the 
in-flight data means Flink can provide a
+higher throughput that's more resilient to small hiccups in the pipeline but 
it has negative effect for the
+checkpoint time. The long checkpoint time issue can be caused by many things, 
one of those is checkpoint barriers
+propagation time. Checkpoint in Flink can finish only once all subtask 
receives all injected checkpoint
+barriers. In aligned checkpoints([see]{{< ref 
"docs/concepts/stateful-stream-processing" >}}#checkpointing)
+those checkpoint barriers are traveling throughout the job graph long along

Review comment:
       ```suggestion
   those checkpoint barriers are traveling throughout the job graph along
   ```

##########
File path: docs/content/docs/deployment/network_buffer.md
##########
@@ -0,0 +1,125 @@
+---
+title: "Network Tuning"
+weight: 100
+type: docs
+aliases:
+  - /deployment/network_buffer.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Network buffer
+
+## Overview
+
+Each record in flink is sent to the next subtask not individually but 
compounded in Network buffer,
+the smallest unit for communication between subtasks. Also, in order to keep 
consistent high throughput,
+Flink uses the network buffer queues (so called in-flight data) both on the 
output as well as on the input side. 
+In the result each subtask have an input queue waiting for the consumption and 
an output queue
+waiting for sending to the next subtask. Having a larger amount of the 
in-flight data means Flink can provide a
+higher throughput that's more resilient to small hiccups in the pipeline but 
it has negative effect for the
+checkpoint time. The long checkpoint time issue can be caused by many things, 
one of those is checkpoint barriers
+propagation time. Checkpoint in Flink can finish only once all subtask 
receives all injected checkpoint
+barriers. In aligned checkpoints([see]{{< ref 
"docs/concepts/stateful-stream-processing" >}}#checkpointing)
+those checkpoint barriers are traveling throughout the job graph long along
+the network buffers and the larger amount of in-flight data the longer the 
checkpoint barrier propagation
+time. In unaligned checkpoints(([see]{{< ref 
"docs/concepts/stateful-stream-processing" >}}#unaligned-checkpointing)) on the 
other hand, the more in-flight data, the larger the checkpoint size as
+all of the captured in-flight data has to be persisted as part of the 
checkpoint.
+
+## Buffer debloat
+
+Historically the only way to configure the amount of in-flight data was to 
specify both amount and the size
+of the buffers. However ideal values for those numbers are hard to pick, as 
they are different for every
+deployment. The buffer debloating mechanism added in Flink 1.14 attempts to 
address this issue.
+It tries to automatically adjust the amount of in-flight data in order to a 
reasonable values.
+More precisely, the buffer debloating calculate the maximum possible throughput
+(the maximum throughput which would be if the subtask was always busy)
+for the subtask and adjusts the amount of in-flight data in such a way that 
the time for consumption of those in-flight data will be equal to the 
configured value.
+
+The most useful settings:
+* The buffer debloat can be enabled by setting the property 
`taskmanager.network.memory.buffer-debloat.enabled` to `true`. 
+* Desirable time of the consumption in-flight data can be configured by 
setting `taskmanager.network.memory.buffer-debloat.target` to `duration`.
+  The default value of the debloat target should be good enough in most cases.
+
+Buffer debloating in Flink works by measuring past throguhput to predict 
future time to consume the remaining
+in-flight data. If those predictions are incorrect, the debloating mechanism 
can fail in one of the two ways:
+* There won't be enough buffered data to provide full throughput
+* There will be too many buffered in-flight data and the aligned checkpoint 
barriers propagation time or the unaligned checkpoint size will suffer.
+
+Hence, if you have a varying load in your job, for example a sudden spikes of 
incoming records, or periodically
+firing windowed aggregations or joins, you might need to adjust the following 
settings:
+
+* `taskmanager.network.memory.buffer-debloat.period` - the minimum time 
between buffer size recalculation.
+The shorter the period, the faster reaction time of the debloating mechanism, 
but a higher CPU overhead for the necessary calculations.
+* `taskmanager.network.memory.buffer-debloat.samples` - Adjust the number of 
samples over which throughput measurements are averaged out.
+The frequency of the collected samples can be adjusted via 
`taskmanager.network.memory.buffer-debloat.period`.
+The fewer samples, the faster reaction time of the debloating mechanism, but a 
higher chance of a sudden spike or drop of the throughput to cause the buffer 
debloating to miscalculate the best amount of the in-flight data.
+* `taskmanager.network.memory.buffer-debloat.threshold-percentages` - The 
optimization which prevents 
+the frequent buffer size change if the new size is not so different compared 
to the old one.
+
+See the [Configuration]({{< ref "docs/deployment/config" 
>}}#full-taskmanageroptions) documentation for details and additional 
parameters.
+
+The metrics([see]({{< ref "docs/ops/metrics" >}}#io)) which can help to 
observe the current buffer size:
+* `estimatedTimeToConsumerBuffersMs` - the total time to consume data from all 
input channels.
+* `debloatedBufferSize` - the current buffer size.
+
+## Network buffer lifecycle
+Logically, Flink has several local buffer pools one for output stream and one 
for each input gate. 
+Each of that pools is limited to at most 
+```
+#channels * taskmanager.network.memory.buffers-per-channel + 
taskmanager.network.memory.floating-buffers-per-gate
+```
+
+The size of the buffer can be configured by setting 
`taskmanager.memory.segment-size`.
+
+### Input network buffers
+Buffers in the input channel are divided into exclusive and floating buffers.
+Flink attempts to acquire the configured amount of the exclusive buffers in 
the initialization phase for each channel.
+Exclusive buffers can be used only by one particular channel.
+A channel can request additional floating buffers from a buffer pool shared 
across all channels belonging to the given input gate.
+Flink treats the configured amount of exclusive and floating buffers as only a 
recommended values.
+If there are not enough buffers available on the input side, Flink will be 
able to make a progress with zero exclusive buffers and a single floating 
buffer.
+
+### Output network buffers
+Unlike the input buffer pool, the output buffers pool have only one type of 
buffers which it shares all among all subpartitions.
+In order to avoid the excessive data skew, the number of buffers for each 
subpartition is limited by `taskmanager.network.memory.max-buffers-per-channel`.
+
+Similarly, as on the input side, the configured amount of the exclusive 
buffers and floating buffers is treated only
+as the recommended values. If there are not enough buffers available, Flink 
will be able to make a progress with
+only a single exclusive buffer per output subpartition and zero floating 
buffers.
+
+## Performance 
+### Selecting the buffer size
+As was described before, the buffer collects records in order to optimize 
network overhead for sending 
+the data portion to the next subtask. The too-small buffer size especially 
less than the one record 
+doesn't make sense because the overhead is still pretty big and anyway the 
next subtask should receive 
+all parts of record before the consuming it(as result it leads to low 
throughput). At the same time, 
+the big buffer size leads to high memory usage, huge checkpoint data(for 
unaligned checkpoint), 

Review comment:
       ```suggestion
   too large buffer size leads to high memory usage, huge checkpoint data (for 
unaligned checkpoint), 
   ```

##########
File path: docs/content/docs/deployment/network_buffer.md
##########
@@ -0,0 +1,125 @@
+---
+title: "Network Tuning"
+weight: 100
+type: docs
+aliases:
+  - /deployment/network_buffer.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Network buffer
+
+## Overview
+
+Each record in flink is sent to the next subtask not individually but 
compounded in Network buffer,
+the smallest unit for communication between subtasks. Also, in order to keep 
consistent high throughput,
+Flink uses the network buffer queues (so called in-flight data) both on the 
output as well as on the input side. 
+In the result each subtask have an input queue waiting for the consumption and 
an output queue
+waiting for sending to the next subtask. Having a larger amount of the 
in-flight data means Flink can provide a
+higher throughput that's more resilient to small hiccups in the pipeline but 
it has negative effect for the
+checkpoint time. The long checkpoint time issue can be caused by many things, 
one of those is checkpoint barriers
+propagation time. Checkpoint in Flink can finish only once all subtask 
receives all injected checkpoint
+barriers. In aligned checkpoints([see]{{< ref 
"docs/concepts/stateful-stream-processing" >}}#checkpointing)
+those checkpoint barriers are traveling throughout the job graph long along
+the network buffers and the larger amount of in-flight data the longer the 
checkpoint barrier propagation
+time. In unaligned checkpoints(([see]{{< ref 
"docs/concepts/stateful-stream-processing" >}}#unaligned-checkpointing)) on the 
other hand, the more in-flight data, the larger the checkpoint size as
+all of the captured in-flight data has to be persisted as part of the 
checkpoint.
+
+## Buffer debloat
+
+Historically the only way to configure the amount of in-flight data was to 
specify both amount and the size
+of the buffers. However ideal values for those numbers are hard to pick, as 
they are different for every
+deployment. The buffer debloating mechanism added in Flink 1.14 attempts to 
address this issue.
+It tries to automatically adjust the amount of in-flight data in order to a 
reasonable values.
+More precisely, the buffer debloating calculate the maximum possible throughput
+(the maximum throughput which would be if the subtask was always busy)
+for the subtask and adjusts the amount of in-flight data in such a way that 
the time for consumption of those in-flight data will be equal to the 
configured value.
+
+The most useful settings:
+* The buffer debloat can be enabled by setting the property 
`taskmanager.network.memory.buffer-debloat.enabled` to `true`. 
+* Desirable time of the consumption in-flight data can be configured by 
setting `taskmanager.network.memory.buffer-debloat.target` to `duration`.
+  The default value of the debloat target should be good enough in most cases.
+
+Buffer debloating in Flink works by measuring past throguhput to predict 
future time to consume the remaining
+in-flight data. If those predictions are incorrect, the debloating mechanism 
can fail in one of the two ways:
+* There won't be enough buffered data to provide full throughput
+* There will be too many buffered in-flight data and the aligned checkpoint 
barriers propagation time or the unaligned checkpoint size will suffer.
+
+Hence, if you have a varying load in your job, for example a sudden spikes of 
incoming records, or periodically
+firing windowed aggregations or joins, you might need to adjust the following 
settings:
+
+* `taskmanager.network.memory.buffer-debloat.period` - the minimum time 
between buffer size recalculation.
+The shorter the period, the faster reaction time of the debloating mechanism, 
but a higher CPU overhead for the necessary calculations.
+* `taskmanager.network.memory.buffer-debloat.samples` - Adjust the number of 
samples over which throughput measurements are averaged out.
+The frequency of the collected samples can be adjusted via 
`taskmanager.network.memory.buffer-debloat.period`.
+The fewer samples, the faster reaction time of the debloating mechanism, but a 
higher chance of a sudden spike or drop of the throughput to cause the buffer 
debloating to miscalculate the best amount of the in-flight data.
+* `taskmanager.network.memory.buffer-debloat.threshold-percentages` - The 
optimization which prevents 
+the frequent buffer size change if the new size is not so different compared 
to the old one.
+
+See the [Configuration]({{< ref "docs/deployment/config" 
>}}#full-taskmanageroptions) documentation for details and additional 
parameters.
+
+The metrics([see]({{< ref "docs/ops/metrics" >}}#io)) which can help to 
observe the current buffer size:
+* `estimatedTimeToConsumerBuffersMs` - the total time to consume data from all 
input channels.
+* `debloatedBufferSize` - the current buffer size.
+
+## Network buffer lifecycle
+Logically, Flink has several local buffer pools one for output stream and one 
for each input gate. 
+Each of that pools is limited to at most 
+```
+#channels * taskmanager.network.memory.buffers-per-channel + 
taskmanager.network.memory.floating-buffers-per-gate
+```
+
+The size of the buffer can be configured by setting 
`taskmanager.memory.segment-size`.
+
+### Input network buffers
+Buffers in the input channel are divided into exclusive and floating buffers.
+Flink attempts to acquire the configured amount of the exclusive buffers in 
the initialization phase for each channel.
+Exclusive buffers can be used only by one particular channel.
+A channel can request additional floating buffers from a buffer pool shared 
across all channels belonging to the given input gate.
+Flink treats the configured amount of exclusive and floating buffers as only a 
recommended values.
+If there are not enough buffers available on the input side, Flink will be 
able to make a progress with zero exclusive buffers and a single floating 
buffer.
+
+### Output network buffers
+Unlike the input buffer pool, the output buffers pool have only one type of 
buffers which it shares all among all subpartitions.
+In order to avoid the excessive data skew, the number of buffers for each 
subpartition is limited by `taskmanager.network.memory.max-buffers-per-channel`.
+
+Similarly, as on the input side, the configured amount of the exclusive 
buffers and floating buffers is treated only
+as the recommended values. If there are not enough buffers available, Flink 
will be able to make a progress with
+only a single exclusive buffer per output subpartition and zero floating 
buffers.
+
+## Performance 
+### Selecting the buffer size
+As was described before, the buffer collects records in order to optimize 
network overhead for sending 
+the data portion to the next subtask. The too-small buffer size especially 
less than the one record 
+doesn't make sense because the overhead is still pretty big and anyway the 
next subtask should receive 
+all parts of record before the consuming it(as result it leads to low 
throughput). At the same time, 

Review comment:
       ```suggestion
   all parts of record before the consuming it (as result it can lead to a low 
throughput). At the same time, 
   ```

##########
File path: docs/content/docs/deployment/network_buffer.md
##########
@@ -0,0 +1,125 @@
+---
+title: "Network Tuning"
+weight: 100
+type: docs
+aliases:
+  - /deployment/network_buffer.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Network buffer
+
+## Overview
+
+Each record in flink is sent to the next subtask not individually but 
compounded in Network buffer,
+the smallest unit for communication between subtasks. Also, in order to keep 
consistent high throughput,
+Flink uses the network buffer queues (so called in-flight data) both on the 
output as well as on the input side. 
+In the result each subtask have an input queue waiting for the consumption and 
an output queue
+waiting for sending to the next subtask. Having a larger amount of the 
in-flight data means Flink can provide a
+higher throughput that's more resilient to small hiccups in the pipeline but 
it has negative effect for the
+checkpoint time. The long checkpoint time issue can be caused by many things, 
one of those is checkpoint barriers
+propagation time. Checkpoint in Flink can finish only once all subtask 
receives all injected checkpoint
+barriers. In aligned checkpoints([see]{{< ref 
"docs/concepts/stateful-stream-processing" >}}#checkpointing)
+those checkpoint barriers are traveling throughout the job graph long along
+the network buffers and the larger amount of in-flight data the longer the 
checkpoint barrier propagation
+time. In unaligned checkpoints(([see]{{< ref 
"docs/concepts/stateful-stream-processing" >}}#unaligned-checkpointing)) on the 
other hand, the more in-flight data, the larger the checkpoint size as
+all of the captured in-flight data has to be persisted as part of the 
checkpoint.
+
+## Buffer debloat
+
+Historically the only way to configure the amount of in-flight data was to 
specify both amount and the size
+of the buffers. However ideal values for those numbers are hard to pick, as 
they are different for every
+deployment. The buffer debloating mechanism added in Flink 1.14 attempts to 
address this issue.
+It tries to automatically adjust the amount of in-flight data in order to a 
reasonable values.
+More precisely, the buffer debloating calculate the maximum possible throughput
+(the maximum throughput which would be if the subtask was always busy)
+for the subtask and adjusts the amount of in-flight data in such a way that 
the time for consumption of those in-flight data will be equal to the 
configured value.
+
+The most useful settings:
+* The buffer debloat can be enabled by setting the property 
`taskmanager.network.memory.buffer-debloat.enabled` to `true`. 
+* Desirable time of the consumption in-flight data can be configured by 
setting `taskmanager.network.memory.buffer-debloat.target` to `duration`.
+  The default value of the debloat target should be good enough in most cases.
+
+Buffer debloating in Flink works by measuring past throguhput to predict 
future time to consume the remaining
+in-flight data. If those predictions are incorrect, the debloating mechanism 
can fail in one of the two ways:
+* There won't be enough buffered data to provide full throughput
+* There will be too many buffered in-flight data and the aligned checkpoint 
barriers propagation time or the unaligned checkpoint size will suffer.
+
+Hence, if you have a varying load in your job, for example a sudden spikes of 
incoming records, or periodically
+firing windowed aggregations or joins, you might need to adjust the following 
settings:
+
+* `taskmanager.network.memory.buffer-debloat.period` - the minimum time 
between buffer size recalculation.

Review comment:
       ```suggestion
   * `taskmanager.network.memory.buffer-debloat.period` - The minimum time 
between buffer size recalculation.
   ```

##########
File path: docs/content/docs/deployment/network_buffer.md
##########
@@ -0,0 +1,125 @@
+---
+title: "Network Tuning"
+weight: 100
+type: docs
+aliases:
+  - /deployment/network_buffer.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Network buffer
+
+## Overview
+
+Each record in flink is sent to the next subtask not individually but 
compounded in Network buffer,
+the smallest unit for communication between subtasks. Also, in order to keep 
consistent high throughput,
+Flink uses the network buffer queues (so called in-flight data) both on the 
output as well as on the input side. 
+In the result each subtask have an input queue waiting for the consumption and 
an output queue
+waiting for sending to the next subtask. Having a larger amount of the 
in-flight data means Flink can provide a
+higher throughput that's more resilient to small hiccups in the pipeline but 
it has negative effect for the
+checkpoint time. The long checkpoint time issue can be caused by many things, 
one of those is checkpoint barriers
+propagation time. Checkpoint in Flink can finish only once all subtask 
receives all injected checkpoint
+barriers. In aligned checkpoints([see]{{< ref 
"docs/concepts/stateful-stream-processing" >}}#checkpointing)
+those checkpoint barriers are traveling throughout the job graph long along
+the network buffers and the larger amount of in-flight data the longer the 
checkpoint barrier propagation
+time. In unaligned checkpoints(([see]{{< ref 
"docs/concepts/stateful-stream-processing" >}}#unaligned-checkpointing)) on the 
other hand, the more in-flight data, the larger the checkpoint size as
+all of the captured in-flight data has to be persisted as part of the 
checkpoint.
+
+## Buffer debloat
+
+Historically the only way to configure the amount of in-flight data was to 
specify both amount and the size
+of the buffers. However ideal values for those numbers are hard to pick, as 
they are different for every
+deployment. The buffer debloating mechanism added in Flink 1.14 attempts to 
address this issue.
+It tries to automatically adjust the amount of in-flight data in order to a 
reasonable values.
+More precisely, the buffer debloating calculate the maximum possible throughput
+(the maximum throughput which would be if the subtask was always busy)
+for the subtask and adjusts the amount of in-flight data in such a way that 
the time for consumption of those in-flight data will be equal to the 
configured value.
+
+The most useful settings:
+* The buffer debloat can be enabled by setting the property 
`taskmanager.network.memory.buffer-debloat.enabled` to `true`. 
+* Desirable time of the consumption in-flight data can be configured by 
setting `taskmanager.network.memory.buffer-debloat.target` to `duration`.
+  The default value of the debloat target should be good enough in most cases.
+
+Buffer debloating in Flink works by measuring past throguhput to predict 
future time to consume the remaining
+in-flight data. If those predictions are incorrect, the debloating mechanism 
can fail in one of the two ways:
+* There won't be enough buffered data to provide full throughput
+* There will be too many buffered in-flight data and the aligned checkpoint 
barriers propagation time or the unaligned checkpoint size will suffer.
+
+Hence, if you have a varying load in your job, for example a sudden spikes of 
incoming records, or periodically
+firing windowed aggregations or joins, you might need to adjust the following 
settings:
+
+* `taskmanager.network.memory.buffer-debloat.period` - the minimum time 
between buffer size recalculation.
+The shorter the period, the faster reaction time of the debloating mechanism, 
but a higher CPU overhead for the necessary calculations.
+* `taskmanager.network.memory.buffer-debloat.samples` - Adjust the number of 
samples over which throughput measurements are averaged out.
+The frequency of the collected samples can be adjusted via 
`taskmanager.network.memory.buffer-debloat.period`.
+The fewer samples, the faster reaction time of the debloating mechanism, but a 
higher chance of a sudden spike or drop of the throughput to cause the buffer 
debloating to miscalculate the best amount of the in-flight data.
+* `taskmanager.network.memory.buffer-debloat.threshold-percentages` - The 
optimization which prevents 
+the frequent buffer size change if the new size is not so different compared 
to the old one.
+
+See the [Configuration]({{< ref "docs/deployment/config" 
>}}#full-taskmanageroptions) documentation for details and additional 
parameters.
+
+The metrics([see]({{< ref "docs/ops/metrics" >}}#io)) which can help to 
observe the current buffer size:
+* `estimatedTimeToConsumerBuffersMs` - the total time to consume data from all 
input channels.
+* `debloatedBufferSize` - the current buffer size.
+
+## Network buffer lifecycle
+Logically, Flink has several local buffer pools one for output stream and one 
for each input gate. 
+Each of that pools is limited to at most 
+```
+#channels * taskmanager.network.memory.buffers-per-channel + 
taskmanager.network.memory.floating-buffers-per-gate
+```
+
+The size of the buffer can be configured by setting 
`taskmanager.memory.segment-size`.
+
+### Input network buffers
+Buffers in the input channel are divided into exclusive and floating buffers.
+Flink attempts to acquire the configured amount of the exclusive buffers in 
the initialization phase for each channel.
+Exclusive buffers can be used only by one particular channel.
+A channel can request additional floating buffers from a buffer pool shared 
across all channels belonging to the given input gate.
+Flink treats the configured amount of exclusive and floating buffers as only a 
recommended values.
+If there are not enough buffers available on the input side, Flink will be 
able to make a progress with zero exclusive buffers and a single floating 
buffer.
+
+### Output network buffers
+Unlike the input buffer pool, the output buffers pool have only one type of 
buffers which it shares all among all subpartitions.
+In order to avoid the excessive data skew, the number of buffers for each 
subpartition is limited by `taskmanager.network.memory.max-buffers-per-channel`.
+
+Similarly, as on the input side, the configured amount of the exclusive 
buffers and floating buffers is treated only
+as the recommended values. If there are not enough buffers available, Flink 
will be able to make a progress with
+only a single exclusive buffer per output subpartition and zero floating 
buffers.
+
+## Performance 
+### Selecting the buffer size
+As was described before, the buffer collects records in order to optimize 
network overhead for sending 
+the data portion to the next subtask. The too-small buffer size especially 
less than the one record 
+doesn't make sense because the overhead is still pretty big and anyway the 
next subtask should receive 
+all parts of record before the consuming it(as result it leads to low 
throughput). At the same time, 
+the big buffer size leads to high memory usage, huge checkpoint data(for 
unaligned checkpoint), 
+the big checkpoint time(for aligned checkpoint). Also, the big buffer size 

Review comment:
       ```suggestion
   long checkpoint time (for aligned checkpoint). Also, larger buffer sizes 
   ```

##########
File path: docs/content/docs/deployment/network_buffer.md
##########
@@ -0,0 +1,125 @@
+---
+title: "Network Tuning"
+weight: 100
+type: docs
+aliases:
+  - /deployment/network_buffer.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Network buffer
+
+## Overview
+
+Each record in flink is sent to the next subtask not individually but 
compounded in Network buffer,
+the smallest unit for communication between subtasks. Also, in order to keep 
consistent high throughput,
+Flink uses the network buffer queues (so called in-flight data) both on the 
output as well as on the input side. 
+In the result each subtask have an input queue waiting for the consumption and 
an output queue
+waiting for sending to the next subtask. Having a larger amount of the 
in-flight data means Flink can provide a
+higher throughput that's more resilient to small hiccups in the pipeline but 
it has negative effect for the
+checkpoint time. The long checkpoint time issue can be caused by many things, 
one of those is checkpoint barriers
+propagation time. Checkpoint in Flink can finish only once all subtask 
receives all injected checkpoint
+barriers. In aligned checkpoints([see]{{< ref 
"docs/concepts/stateful-stream-processing" >}}#checkpointing)
+those checkpoint barriers are traveling throughout the job graph long along
+the network buffers and the larger amount of in-flight data the longer the 
checkpoint barrier propagation
+time. In unaligned checkpoints(([see]{{< ref 
"docs/concepts/stateful-stream-processing" >}}#unaligned-checkpointing)) on the 
other hand, the more in-flight data, the larger the checkpoint size as
+all of the captured in-flight data has to be persisted as part of the 
checkpoint.
+
+## Buffer debloat
+
+Historically the only way to configure the amount of in-flight data was to 
specify both amount and the size
+of the buffers. However ideal values for those numbers are hard to pick, as 
they are different for every
+deployment. The buffer debloating mechanism added in Flink 1.14 attempts to 
address this issue.
+It tries to automatically adjust the amount of in-flight data in order to a 
reasonable values.
+More precisely, the buffer debloating calculate the maximum possible throughput
+(the maximum throughput which would be if the subtask was always busy)
+for the subtask and adjusts the amount of in-flight data in such a way that 
the time for consumption of those in-flight data will be equal to the 
configured value.
+
+The most useful settings:
+* The buffer debloat can be enabled by setting the property 
`taskmanager.network.memory.buffer-debloat.enabled` to `true`. 
+* Desirable time of the consumption in-flight data can be configured by 
setting `taskmanager.network.memory.buffer-debloat.target` to `duration`.
+  The default value of the debloat target should be good enough in most cases.
+
+Buffer debloating in Flink works by measuring past throguhput to predict 
future time to consume the remaining
+in-flight data. If those predictions are incorrect, the debloating mechanism 
can fail in one of the two ways:
+* There won't be enough buffered data to provide full throughput
+* There will be too many buffered in-flight data and the aligned checkpoint 
barriers propagation time or the unaligned checkpoint size will suffer.
+
+Hence, if you have a varying load in your job, for example a sudden spikes of 
incoming records, or periodically
+firing windowed aggregations or joins, you might need to adjust the following 
settings:
+
+* `taskmanager.network.memory.buffer-debloat.period` - the minimum time 
between buffer size recalculation.
+The shorter the period, the faster reaction time of the debloating mechanism, 
but a higher CPU overhead for the necessary calculations.
+* `taskmanager.network.memory.buffer-debloat.samples` - Adjust the number of 
samples over which throughput measurements are averaged out.
+The frequency of the collected samples can be adjusted via 
`taskmanager.network.memory.buffer-debloat.period`.
+The fewer samples, the faster reaction time of the debloating mechanism, but a 
higher chance of a sudden spike or drop of the throughput to cause the buffer 
debloating to miscalculate the best amount of the in-flight data.
+* `taskmanager.network.memory.buffer-debloat.threshold-percentages` - The 
optimization which prevents 
+the frequent buffer size change if the new size is not so different compared 
to the old one.
+
+See the [Configuration]({{< ref "docs/deployment/config" 
>}}#full-taskmanageroptions) documentation for details and additional 
parameters.
+
+The metrics([see]({{< ref "docs/ops/metrics" >}}#io)) which can help to 
observe the current buffer size:
+* `estimatedTimeToConsumerBuffersMs` - the total time to consume data from all 
input channels.
+* `debloatedBufferSize` - the current buffer size.
+
+## Network buffer lifecycle
+Logically, Flink has several local buffer pools one for output stream and one 
for each input gate. 
+Each of that pools is limited to at most 
+```
+#channels * taskmanager.network.memory.buffers-per-channel + 
taskmanager.network.memory.floating-buffers-per-gate
+```
+
+The size of the buffer can be configured by setting 
`taskmanager.memory.segment-size`.
+
+### Input network buffers
+Buffers in the input channel are divided into exclusive and floating buffers.
+Flink attempts to acquire the configured amount of the exclusive buffers in 
the initialization phase for each channel.
+Exclusive buffers can be used only by one particular channel.
+A channel can request additional floating buffers from a buffer pool shared 
across all channels belonging to the given input gate.
+Flink treats the configured amount of exclusive and floating buffers as only a 
recommended values.
+If there are not enough buffers available on the input side, Flink will be 
able to make a progress with zero exclusive buffers and a single floating 
buffer.
+
+### Output network buffers
+Unlike the input buffer pool, the output buffers pool have only one type of 
buffers which it shares all among all subpartitions.
+In order to avoid the excessive data skew, the number of buffers for each 
subpartition is limited by `taskmanager.network.memory.max-buffers-per-channel`.
+
+Similarly, as on the input side, the configured amount of the exclusive 
buffers and floating buffers is treated only
+as the recommended values. If there are not enough buffers available, Flink 
will be able to make a progress with
+only a single exclusive buffer per output subpartition and zero floating 
buffers.
+
+## Performance 
+### Selecting the buffer size
+As was described before, the buffer collects records in order to optimize 
network overhead for sending 
+the data portion to the next subtask. The too-small buffer size especially 
less than the one record 
+doesn't make sense because the overhead is still pretty big and anyway the 
next subtask should receive 
+all parts of record before the consuming it(as result it leads to low 
throughput). At the same time, 
+the big buffer size leads to high memory usage, huge checkpoint data(for 
unaligned checkpoint), 
+the big checkpoint time(for aligned checkpoint). Also, the big buffer size 
+doesn't make sense with small `execution.buffer-timeout` because in this case 
all(or almost all) 
+sending buffer would be only partially full which means that the allocated 
memory won't be utilized properly.
+
+### Selecting the buffer count
+The number of buffers is mostly configured by 
`taskmanager.network.memory.buffers-per-channel`, 
`taskmanager.network.memory.floating-buffers-per-gate`.

Review comment:
       ```suggestion
   The number of buffers is configured by 
`taskmanager.network.memory.buffers-per-channel`, 
`taskmanager.network.memory.floating-buffers-per-gate`.
   ```

##########
File path: docs/content/docs/deployment/network_buffer.md
##########
@@ -0,0 +1,125 @@
+---
+title: "Network Tuning"
+weight: 100
+type: docs
+aliases:
+  - /deployment/network_buffer.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Network buffer
+
+## Overview
+
+Each record in flink is sent to the next subtask not individually but 
compounded in Network buffer,
+the smallest unit for communication between subtasks. Also, in order to keep 
consistent high throughput,
+Flink uses the network buffer queues (so called in-flight data) both on the 
output as well as on the input side. 
+In the result each subtask have an input queue waiting for the consumption and 
an output queue
+waiting for sending to the next subtask. Having a larger amount of the 
in-flight data means Flink can provide a
+higher throughput that's more resilient to small hiccups in the pipeline but 
it has negative effect for the
+checkpoint time. The long checkpoint time issue can be caused by many things, 
one of those is checkpoint barriers
+propagation time. Checkpoint in Flink can finish only once all subtask 
receives all injected checkpoint
+barriers. In aligned checkpoints([see]{{< ref 
"docs/concepts/stateful-stream-processing" >}}#checkpointing)
+those checkpoint barriers are traveling throughout the job graph long along
+the network buffers and the larger amount of in-flight data the longer the 
checkpoint barrier propagation
+time. In unaligned checkpoints(([see]{{< ref 
"docs/concepts/stateful-stream-processing" >}}#unaligned-checkpointing)) on the 
other hand, the more in-flight data, the larger the checkpoint size as
+all of the captured in-flight data has to be persisted as part of the 
checkpoint.
+
+## Buffer debloat
+
+Historically the only way to configure the amount of in-flight data was to 
specify both amount and the size
+of the buffers. However ideal values for those numbers are hard to pick, as 
they are different for every
+deployment. The buffer debloating mechanism added in Flink 1.14 attempts to 
address this issue.
+It tries to automatically adjust the amount of in-flight data in order to a 
reasonable values.
+More precisely, the buffer debloating calculate the maximum possible throughput
+(the maximum throughput which would be if the subtask was always busy)
+for the subtask and adjusts the amount of in-flight data in such a way that 
the time for consumption of those in-flight data will be equal to the 
configured value.
+
+The most useful settings:
+* The buffer debloat can be enabled by setting the property 
`taskmanager.network.memory.buffer-debloat.enabled` to `true`. 
+* Desirable time of the consumption in-flight data can be configured by 
setting `taskmanager.network.memory.buffer-debloat.target` to `duration`.
+  The default value of the debloat target should be good enough in most cases.
+
+Buffer debloating in Flink works by measuring past throguhput to predict 
future time to consume the remaining
+in-flight data. If those predictions are incorrect, the debloating mechanism 
can fail in one of the two ways:
+* There won't be enough buffered data to provide full throughput
+* There will be too many buffered in-flight data and the aligned checkpoint 
barriers propagation time or the unaligned checkpoint size will suffer.
+
+Hence, if you have a varying load in your job, for example a sudden spikes of 
incoming records, or periodically
+firing windowed aggregations or joins, you might need to adjust the following 
settings:
+
+* `taskmanager.network.memory.buffer-debloat.period` - the minimum time 
between buffer size recalculation.
+The shorter the period, the faster reaction time of the debloating mechanism, 
but a higher CPU overhead for the necessary calculations.
+* `taskmanager.network.memory.buffer-debloat.samples` - Adjust the number of 
samples over which throughput measurements are averaged out.
+The frequency of the collected samples can be adjusted via 
`taskmanager.network.memory.buffer-debloat.period`.
+The fewer samples, the faster reaction time of the debloating mechanism, but a 
higher chance of a sudden spike or drop of the throughput to cause the buffer 
debloating to miscalculate the best amount of the in-flight data.
+* `taskmanager.network.memory.buffer-debloat.threshold-percentages` - The 
optimization which prevents 
+the frequent buffer size change if the new size is not so different compared 
to the old one.
+
+See the [Configuration]({{< ref "docs/deployment/config" 
>}}#full-taskmanageroptions) documentation for details and additional 
parameters.
+
+The metrics([see]({{< ref "docs/ops/metrics" >}}#io)) which can help to 
observe the current buffer size:
+* `estimatedTimeToConsumerBuffersMs` - the total time to consume data from all 
input channels.
+* `debloatedBufferSize` - the current buffer size.
+
+## Network buffer lifecycle
+Logically, Flink has several local buffer pools one for output stream and one 
for each input gate. 
+Each of that pools is limited to at most 
+```
+#channels * taskmanager.network.memory.buffers-per-channel + 
taskmanager.network.memory.floating-buffers-per-gate
+```
+
+The size of the buffer can be configured by setting 
`taskmanager.memory.segment-size`.
+
+### Input network buffers
+Buffers in the input channel are divided into exclusive and floating buffers.
+Flink attempts to acquire the configured amount of the exclusive buffers in 
the initialization phase for each channel.
+Exclusive buffers can be used only by one particular channel.
+A channel can request additional floating buffers from a buffer pool shared 
across all channels belonging to the given input gate.
+Flink treats the configured amount of exclusive and floating buffers as only a 
recommended values.
+If there are not enough buffers available on the input side, Flink will be 
able to make a progress with zero exclusive buffers and a single floating 
buffer.
+
+### Output network buffers
+Unlike the input buffer pool, the output buffers pool have only one type of 
buffers which it shares all among all subpartitions.
+In order to avoid the excessive data skew, the number of buffers for each 
subpartition is limited by `taskmanager.network.memory.max-buffers-per-channel`.
+
+Similarly, as on the input side, the configured amount of the exclusive 
buffers and floating buffers is treated only
+as the recommended values. If there are not enough buffers available, Flink 
will be able to make a progress with
+only a single exclusive buffer per output subpartition and zero floating 
buffers.
+
+## Performance 
+### Selecting the buffer size
+As was described before, the buffer collects records in order to optimize 
network overhead for sending 
+the data portion to the next subtask. The too-small buffer size especially 
less than the one record 
+doesn't make sense because the overhead is still pretty big and anyway the 
next subtask should receive 
+all parts of record before the consuming it(as result it leads to low 
throughput). At the same time, 
+the big buffer size leads to high memory usage, huge checkpoint data(for 
unaligned checkpoint), 
+the big checkpoint time(for aligned checkpoint). Also, the big buffer size 
+doesn't make sense with small `execution.buffer-timeout` because in this case 
all(or almost all) 
+sending buffer would be only partially full which means that the allocated 
memory won't be utilized properly.

Review comment:
       ```suggestion
   flushed buffers would be sent only partially filled which means that the 
allocated memory won't be utilized efficiently.
   ```

##########
File path: docs/content/docs/deployment/network_buffer.md
##########
@@ -0,0 +1,125 @@
+---
+title: "Network Tuning"
+weight: 100
+type: docs
+aliases:
+  - /deployment/network_buffer.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Network buffer
+
+## Overview
+
+Each record in flink is sent to the next subtask not individually but 
compounded in Network buffer,
+the smallest unit for communication between subtasks. Also, in order to keep 
consistent high throughput,
+Flink uses the network buffer queues (so called in-flight data) both on the 
output as well as on the input side. 
+In the result each subtask have an input queue waiting for the consumption and 
an output queue
+waiting for sending to the next subtask. Having a larger amount of the 
in-flight data means Flink can provide a
+higher throughput that's more resilient to small hiccups in the pipeline but 
it has negative effect for the
+checkpoint time. The long checkpoint time issue can be caused by many things, 
one of those is checkpoint barriers
+propagation time. Checkpoint in Flink can finish only once all subtask 
receives all injected checkpoint
+barriers. In aligned checkpoints([see]{{< ref 
"docs/concepts/stateful-stream-processing" >}}#checkpointing)
+those checkpoint barriers are traveling throughout the job graph long along
+the network buffers and the larger amount of in-flight data the longer the 
checkpoint barrier propagation
+time. In unaligned checkpoints(([see]{{< ref 
"docs/concepts/stateful-stream-processing" >}}#unaligned-checkpointing)) on the 
other hand, the more in-flight data, the larger the checkpoint size as
+all of the captured in-flight data has to be persisted as part of the 
checkpoint.
+
+## Buffer debloat
+
+Historically the only way to configure the amount of in-flight data was to 
specify both amount and the size
+of the buffers. However ideal values for those numbers are hard to pick, as 
they are different for every
+deployment. The buffer debloating mechanism added in Flink 1.14 attempts to 
address this issue.
+It tries to automatically adjust the amount of in-flight data in order to a 
reasonable values.

Review comment:
       ```suggestion
   It tries to automatically adjust the amount of in-flight data to reasonable 
values.
   ```

##########
File path: docs/content/docs/deployment/network_buffer.md
##########
@@ -0,0 +1,125 @@
+---
+title: "Network Tuning"
+weight: 100
+type: docs
+aliases:
+  - /deployment/network_buffer.html

Review comment:
       rename the file to `network_mem_tuning.md`? (to be consistent with 
`mem_tuning.md`).

##########
File path: docs/content/docs/deployment/network_buffer.md
##########
@@ -0,0 +1,125 @@
+---
+title: "Network Tuning"
+weight: 100
+type: docs
+aliases:
+  - /deployment/network_buffer.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Network buffer
+
+## Overview
+
+Each record in flink is sent to the next subtask not individually but 
compounded in Network buffer,
+the smallest unit for communication between subtasks. Also, in order to keep 
consistent high throughput,
+Flink uses the network buffer queues (so called in-flight data) both on the 
output as well as on the input side. 
+In the result each subtask have an input queue waiting for the consumption and 
an output queue
+waiting for sending to the next subtask. Having a larger amount of the 
in-flight data means Flink can provide a
+higher throughput that's more resilient to small hiccups in the pipeline but 
it has negative effect for the
+checkpoint time. The long checkpoint time issue can be caused by many things, 
one of those is checkpoint barriers
+propagation time. Checkpoint in Flink can finish only once all subtask 
receives all injected checkpoint
+barriers. In aligned checkpoints([see]{{< ref 
"docs/concepts/stateful-stream-processing" >}}#checkpointing)
+those checkpoint barriers are traveling throughout the job graph long along
+the network buffers and the larger amount of in-flight data the longer the 
checkpoint barrier propagation
+time. In unaligned checkpoints(([see]{{< ref 
"docs/concepts/stateful-stream-processing" >}}#unaligned-checkpointing)) on the 
other hand, the more in-flight data, the larger the checkpoint size as
+all of the captured in-flight data has to be persisted as part of the 
checkpoint.
+
+## Buffer debloat
+
+Historically the only way to configure the amount of in-flight data was to 
specify both amount and the size
+of the buffers. However ideal values for those numbers are hard to pick, as 
they are different for every
+deployment. The buffer debloating mechanism added in Flink 1.14 attempts to 
address this issue.
+It tries to automatically adjust the amount of in-flight data in order to a 
reasonable values.
+More precisely, the buffer debloating calculate the maximum possible throughput
+(the maximum throughput which would be if the subtask was always busy)
+for the subtask and adjusts the amount of in-flight data in such a way that 
the time for consumption of those in-flight data will be equal to the 
configured value.
+
+The most useful settings:
+* The buffer debloat can be enabled by setting the property 
`taskmanager.network.memory.buffer-debloat.enabled` to `true`. 
+* Desirable time of the consumption in-flight data can be configured by 
setting `taskmanager.network.memory.buffer-debloat.target` to `duration`.
+  The default value of the debloat target should be good enough in most cases.
+
+Buffer debloating in Flink works by measuring past throguhput to predict 
future time to consume the remaining
+in-flight data. If those predictions are incorrect, the debloating mechanism 
can fail in one of the two ways:
+* There won't be enough buffered data to provide full throughput
+* There will be too many buffered in-flight data and the aligned checkpoint 
barriers propagation time or the unaligned checkpoint size will suffer.
+
+Hence, if you have a varying load in your job, for example a sudden spikes of 
incoming records, or periodically
+firing windowed aggregations or joins, you might need to adjust the following 
settings:
+
+* `taskmanager.network.memory.buffer-debloat.period` - the minimum time 
between buffer size recalculation.
+The shorter the period, the faster reaction time of the debloating mechanism, 
but a higher CPU overhead for the necessary calculations.
+* `taskmanager.network.memory.buffer-debloat.samples` - Adjust the number of 
samples over which throughput measurements are averaged out.
+The frequency of the collected samples can be adjusted via 
`taskmanager.network.memory.buffer-debloat.period`.
+The fewer samples, the faster reaction time of the debloating mechanism, but a 
higher chance of a sudden spike or drop of the throughput to cause the buffer 
debloating to miscalculate the best amount of the in-flight data.
+* `taskmanager.network.memory.buffer-debloat.threshold-percentages` - The 
optimization which prevents 
+the frequent buffer size change if the new size is not so different compared 
to the old one.
+
+See the [Configuration]({{< ref "docs/deployment/config" 
>}}#full-taskmanageroptions) documentation for details and additional 
parameters.
+
+The metrics([see]({{< ref "docs/ops/metrics" >}}#io)) which can help to 
observe the current buffer size:
+* `estimatedTimeToConsumerBuffersMs` - the total time to consume data from all 
input channels.
+* `debloatedBufferSize` - the current buffer size.
+
+## Network buffer lifecycle
+Logically, Flink has several local buffer pools one for output stream and one 
for each input gate. 
+Each of that pools is limited to at most 
+```
+#channels * taskmanager.network.memory.buffers-per-channel + 
taskmanager.network.memory.floating-buffers-per-gate
+```
+
+The size of the buffer can be configured by setting 
`taskmanager.memory.segment-size`.
+
+### Input network buffers
+Buffers in the input channel are divided into exclusive and floating buffers.
+Flink attempts to acquire the configured amount of the exclusive buffers in 
the initialization phase for each channel.
+Exclusive buffers can be used only by one particular channel.
+A channel can request additional floating buffers from a buffer pool shared 
across all channels belonging to the given input gate.
+Flink treats the configured amount of exclusive and floating buffers as only a 
recommended values.
+If there are not enough buffers available on the input side, Flink will be 
able to make a progress with zero exclusive buffers and a single floating 
buffer.
+
+### Output network buffers
+Unlike the input buffer pool, the output buffers pool have only one type of 
buffers which it shares all among all subpartitions.
+In order to avoid the excessive data skew, the number of buffers for each 
subpartition is limited by `taskmanager.network.memory.max-buffers-per-channel`.
+
+Similarly, as on the input side, the configured amount of the exclusive 
buffers and floating buffers is treated only
+as the recommended values. If there are not enough buffers available, Flink 
will be able to make a progress with
+only a single exclusive buffer per output subpartition and zero floating 
buffers.
+
+## Performance 
+### Selecting the buffer size
+As was described before, the buffer collects records in order to optimize 
network overhead for sending 
+the data portion to the next subtask. The too-small buffer size especially 
less than the one record 
+doesn't make sense because the overhead is still pretty big and anyway the 
next subtask should receive 
+all parts of record before the consuming it(as result it leads to low 
throughput). At the same time, 
+the big buffer size leads to high memory usage, huge checkpoint data(for 
unaligned checkpoint), 
+the big checkpoint time(for aligned checkpoint). Also, the big buffer size 
+doesn't make sense with small `execution.buffer-timeout` because in this case 
all(or almost all) 
+sending buffer would be only partially full which means that the allocated 
memory won't be utilized properly.
+
+### Selecting the buffer count
+The number of buffers is mostly configured by 
`taskmanager.network.memory.buffers-per-channel`, 
`taskmanager.network.memory.floating-buffers-per-gate`.
+The first one is important for the input channel and allows to have the 
exclusive buffers 
+for the channel that influence positively on throughput but negatively on 
memory usage. At the same time,
+`floating-buffers-per-gate` uses the memory in a more optimal way but it can 
lead to throughput degradation in case of data skew.

Review comment:
       ```suggestion
   For best throughput in most cases we recommend to stick with the default 
values for the number of exclusive and floating buffers. If the amount of 
in-flight data is causing problems enabling [buffer debloating](#Buffer 
debloat) is again the easiest and the recommended way of trying to deal with 
those problems. However if for some reason this doesn't work for you, here are 
things to consider when tuning the amount of network buffers.
   
   You should adjust the amount of buffers according to your expected 
throughput (in `bytes/second`). Assigning credits and sending buffers takes 
some time (around two round trip messages between two nodes). This latency 
depends on your network, but for the sake of simplicity in the healthy local 
networks we can assume it's `1ms`. Next we take the buffer size 
(https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/config/#taskmanager-memory-segment-size).
 With those three numbers (buffer round trip time, buffer size and expected 
throughput) we can calculate the number of required buffers to sustain this 
throughput by using this simple formula:
   number_of_buffers = expected_throughput * buffer_roundtrip / buffer_size
   For example, with expected throughput of `320MB/s`, round trip latency of 
`1ms` and the default memory segment size:
   number_of_buffers = 320MB/s * 1ms / 32KB = 10
   This is the number of actively used buffers that we need to achieve the 
expected throughput with the given round trip latency. 
   
   The purpose of floating buffers is to handle the data skew scenario. Ideally 
number of floating buffers (default 8) plus the exclusive buffers (default 2) 
that belong to that channel should be able to saturate the network throughput. 
However sometimes this is either not feasible or even not necessary. Keep in 
mind that it happens very rarely that only a single channel among all subtasks 
on your task manager is being used.
   
   The exclusive buffers on the other hand have a purpose to provide a fluent 
throughput. While one buffer is in transit, the other is being filled up. 
However with larger setups, the number of exclusive buffers is the dominant 
factor defining the amount of in-flight data Flink is using. So in case of back 
pressure in low throughput setups, it might be a good idea to reduce the number 
of [exclusive 
buffers](https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/config/#taskmanager-network-memory-buffers-per-channel).
   
   ### Final words
   
   To sum up. Tuning memory configuration for the network in Flink should boil 
down to just enabling the buffer debloating mechanism. If for some reasons this 
doesn't work, you can try to tune the buffer debloating, or disable it and 
manually configure the memory segment size and the number of buffers. For this 
second scenario we recommend two settings:
   1. Default values for the max throughput.
   2. Reducing the memory segment size and/or number of exclusive buffers to 
speed up checkpointing/reduce memory consumption of the network stack.
   ```

##########
File path: docs/content/docs/deployment/network_buffer.md
##########
@@ -0,0 +1,125 @@
+---
+title: "Network Tuning"
+weight: 100
+type: docs
+aliases:
+  - /deployment/network_buffer.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Network buffer
+
+## Overview
+
+Each record in flink is sent to the next subtask not individually but 
compounded in Network buffer,
+the smallest unit for communication between subtasks. Also, in order to keep 
consistent high throughput,
+Flink uses the network buffer queues (so called in-flight data) both on the 
output as well as on the input side. 
+In the result each subtask have an input queue waiting for the consumption and 
an output queue
+waiting for sending to the next subtask. Having a larger amount of the 
in-flight data means Flink can provide a
+higher throughput that's more resilient to small hiccups in the pipeline but 
it has negative effect for the
+checkpoint time. The long checkpoint time issue can be caused by many things, 
one of those is checkpoint barriers
+propagation time. Checkpoint in Flink can finish only once all subtask 
receives all injected checkpoint
+barriers. In aligned checkpoints([see]{{< ref 
"docs/concepts/stateful-stream-processing" >}}#checkpointing)
+those checkpoint barriers are traveling throughout the job graph long along
+the network buffers and the larger amount of in-flight data the longer the 
checkpoint barrier propagation
+time. In unaligned checkpoints(([see]{{< ref 
"docs/concepts/stateful-stream-processing" >}}#unaligned-checkpointing)) on the 
other hand, the more in-flight data, the larger the checkpoint size as
+all of the captured in-flight data has to be persisted as part of the 
checkpoint.
+
+## Buffer debloat
+
+Historically the only way to configure the amount of in-flight data was to 
specify both amount and the size
+of the buffers. However ideal values for those numbers are hard to pick, as 
they are different for every
+deployment. The buffer debloating mechanism added in Flink 1.14 attempts to 
address this issue.
+It tries to automatically adjust the amount of in-flight data in order to a 
reasonable values.
+More precisely, the buffer debloating calculate the maximum possible throughput
+(the maximum throughput which would be if the subtask was always busy)
+for the subtask and adjusts the amount of in-flight data in such a way that 
the time for consumption of those in-flight data will be equal to the 
configured value.
+
+The most useful settings:
+* The buffer debloat can be enabled by setting the property 
`taskmanager.network.memory.buffer-debloat.enabled` to `true`. 
+* Desirable time of the consumption in-flight data can be configured by 
setting `taskmanager.network.memory.buffer-debloat.target` to `duration`.
+  The default value of the debloat target should be good enough in most cases.
+
+Buffer debloating in Flink works by measuring past throguhput to predict 
future time to consume the remaining
+in-flight data. If those predictions are incorrect, the debloating mechanism 
can fail in one of the two ways:
+* There won't be enough buffered data to provide full throughput
+* There will be too many buffered in-flight data and the aligned checkpoint 
barriers propagation time or the unaligned checkpoint size will suffer.
+
+Hence, if you have a varying load in your job, for example a sudden spikes of 
incoming records, or periodically
+firing windowed aggregations or joins, you might need to adjust the following 
settings:
+
+* `taskmanager.network.memory.buffer-debloat.period` - the minimum time 
between buffer size recalculation.
+The shorter the period, the faster reaction time of the debloating mechanism, 
but a higher CPU overhead for the necessary calculations.
+* `taskmanager.network.memory.buffer-debloat.samples` - Adjust the number of 
samples over which throughput measurements are averaged out.
+The frequency of the collected samples can be adjusted via 
`taskmanager.network.memory.buffer-debloat.period`.
+The fewer samples, the faster reaction time of the debloating mechanism, but a 
higher chance of a sudden spike or drop of the throughput to cause the buffer 
debloating to miscalculate the best amount of the in-flight data.
+* `taskmanager.network.memory.buffer-debloat.threshold-percentages` - The 
optimization which prevents 
+the frequent buffer size change if the new size is not so different compared 
to the old one.
+
+See the [Configuration]({{< ref "docs/deployment/config" 
>}}#full-taskmanageroptions) documentation for details and additional 
parameters.
+
+The metrics([see]({{< ref "docs/ops/metrics" >}}#io)) which can help to 
observe the current buffer size:
+* `estimatedTimeToConsumerBuffersMs` - the total time to consume data from all 
input channels.
+* `debloatedBufferSize` - the current buffer size.
+
+## Network buffer lifecycle
+Logically, Flink has several local buffer pools one for output stream and one 
for each input gate. 
+Each of that pools is limited to at most 
+```
+#channels * taskmanager.network.memory.buffers-per-channel + 
taskmanager.network.memory.floating-buffers-per-gate
+```
+
+The size of the buffer can be configured by setting 
`taskmanager.memory.segment-size`.
+
+### Input network buffers
+Buffers in the input channel are divided into exclusive and floating buffers.
+Flink attempts to acquire the configured amount of the exclusive buffers in 
the initialization phase for each channel.
+Exclusive buffers can be used only by one particular channel.
+A channel can request additional floating buffers from a buffer pool shared 
across all channels belonging to the given input gate.
+Flink treats the configured amount of exclusive and floating buffers as only a 
recommended values.
+If there are not enough buffers available on the input side, Flink will be 
able to make a progress with zero exclusive buffers and a single floating 
buffer.
+
+### Output network buffers
+Unlike the input buffer pool, the output buffers pool have only one type of 
buffers which it shares all among all subpartitions.
+In order to avoid the excessive data skew, the number of buffers for each 
subpartition is limited by `taskmanager.network.memory.max-buffers-per-channel`.
+
+Similarly, as on the input side, the configured amount of the exclusive 
buffers and floating buffers is treated only
+as the recommended values. If there are not enough buffers available, Flink 
will be able to make a progress with
+only a single exclusive buffer per output subpartition and zero floating 
buffers.
+
+## Performance 
+### Selecting the buffer size
+As was described before, the buffer collects records in order to optimize 
network overhead for sending 
+the data portion to the next subtask. The too-small buffer size especially 
less than the one record 
+doesn't make sense because the overhead is still pretty big and anyway the 
next subtask should receive 
+all parts of record before the consuming it(as result it leads to low 
throughput). At the same time, 
+the big buffer size leads to high memory usage, huge checkpoint data(for 
unaligned checkpoint), 
+the big checkpoint time(for aligned checkpoint). Also, the big buffer size 
+doesn't make sense with small `execution.buffer-timeout` because in this case 
all(or almost all) 
+sending buffer would be only partially full which means that the allocated 
memory won't be utilized properly.
+
+### Selecting the buffer count
+The number of buffers is mostly configured by 
`taskmanager.network.memory.buffers-per-channel`, 
`taskmanager.network.memory.floating-buffers-per-gate`.
+The first one is important for the input channel and allows to have the 
exclusive buffers 
+for the channel that influence positively on throughput but negatively on 
memory usage. At the same time,
+`floating-buffers-per-gate` uses the memory in a more optimal way but it can 
lead to throughput degradation in case of data skew.

Review comment:
       Plus adjust formatting/fix links?

##########
File path: docs/content/docs/deployment/network_buffer.md
##########
@@ -0,0 +1,125 @@
+---
+title: "Network Tuning"
+weight: 100
+type: docs
+aliases:
+  - /deployment/network_buffer.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Network buffer

Review comment:
       ```suggestion
   # Network memory tuning guide
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink] pnowojski commented on a change in pull request #16988: [FLINK-23458][docs] Added the network buffer documentation along wit…

Reply via email to