[GitHub] [flink] 1996fanrui commented on a diff in pull request #20038: [FLINK-26762][docs] Document overdraft buffers

GitBox Tue, 21 Jun 2022 05:05:45 -0700


1996fanrui commented on code in PR #20038:
URL: https://github.com/apache/flink/pull/20038#discussion_r902508106



##########
docs/content/docs/deployment/memory/network_mem_tuning.md:
##########
@@ -120,6 +120,19 @@ In order to avoid excessive data skew, the number of 
buffers for each subpartiti
 
 Unlike the input buffer pool, the configured amount of exclusive buffers and 
floating buffers is only treated as recommended values. If there are not enough 
buffers available, Flink can make progress with only a single exclusive buffer 
per output subpartition and zero floating buffers.
 
+#### Overdraft buffers
+
+For each output subtask can also request up to 
`taskmanager.network.memory.max-overdraft-buffers-per-gate` (by default 5) 
extra overdraft buffers.
+Those buffers are only used, if despite presence of a backpressure, Flink can 
not stop producing more records to the output.
+This can happen in situations like:
+- Serializing very large records, that do not fit into a single network buffer.
+- Flat Map like operator, that produces many output records per single input 
record.
+- Operators that output many records either periodically or on a reaction to 
some event (for example `WindowOperator`).
+
+Without overdraft buffers in such situations Flink subtask thread would block 
on the backpressure, preventing for example unaligned checkpoints
+from being triggered. To mitigate this, the overdraft buffers concept has been 
added. Those buffers are strictly optional and Flink can
+make progress even if the Task Manager doesn't have any spare buffers in the 
global pool to be used as overdraft buffers.

Review Comment:
   As I understand, subtask can't request the overdraft buffer when the global 
pool is empty.
   
    So why `Flink can
   make progress even if the Task Manager doesn't have any spare buffers in the 
global pool to be used as overdraft buffers.`?



##########
docs/content/docs/deployment/memory/network_mem_tuning.md:
##########
@@ -120,6 +120,19 @@ In order to avoid excessive data skew, the number of 
buffers for each subpartiti
 
 Unlike the input buffer pool, the configured amount of exclusive buffers and 
floating buffers is only treated as recommended values. If there are not enough 
buffers available, Flink can make progress with only a single exclusive buffer 
per output subpartition and zero floating buffers.
 
+#### Overdraft buffers
+
+For each output subtask can also request up to 
`taskmanager.network.memory.max-overdraft-buffers-per-gate` (by default 5) 
extra overdraft buffers.
+Those buffers are only used, if despite presence of a backpressure, Flink can 
not stop producing more records to the output.
+This can happen in situations like:
+- Serializing very large records, that do not fit into a single network buffer.
+- Flat Map like operator, that produces many output records per single input 
record.
+- Operators that output many records either periodically or on a reaction to 
some event (for example `WindowOperator`).

Review Comment:
   some events



##########
docs/content/docs/deployment/memory/network_mem_tuning.md:
##########
@@ -120,6 +120,19 @@ In order to avoid excessive data skew, the number of 
buffers for each subpartiti
 
 Unlike the input buffer pool, the configured amount of exclusive buffers and 
floating buffers is only treated as recommended values. If there are not enough 
buffers available, Flink can make progress with only a single exclusive buffer 
per output subpartition and zero floating buffers.
 
+#### Overdraft buffers
+
+For each output subtask can also request up to 
`taskmanager.network.memory.max-overdraft-buffers-per-gate` (by default 5) 
extra overdraft buffers.
+Those buffers are only used, if despite presence of a backpressure, Flink can 
not stop producing more records to the output.
+This can happen in situations like:
+- Serializing very large records, that do not fit into a single network buffer.
+- Flat Map like operator, that produces many output records per single input 
record.
+- Operators that output many records either periodically or on a reaction to 
some event (for example `WindowOperator`).
+
+Without overdraft buffers in such situations Flink subtask thread would block 
on the backpressure, preventing for example unaligned checkpoints
+from being triggered. To mitigate this, the overdraft buffers concept has been 
added. Those buffers are strictly optional and Flink can

Review Comment:
   > preventing for example unaligned checkpoints from being triggered.
   
   The overdraft buffer can speed up triggering the unaligned checkpoint of 
subtask, but cannot speed up triggering Checkpoint of Flink job. The trigger 
checkpoint we usually talk about is the job level, here is the subtask level. 
So I think we should write clear `subtask level`.
   
   This may confuse users, and is it appropriate to use trigger here? Are there 
other more appropriate words? How to let the user know that with the overdraft 
buffer, the subtask can start the Unaligned Checkpoint as soon as possible, 
instead of always blocking in requestMemory(or backpressure).



##########
docs/content/docs/deployment/memory/network_mem_tuning.md:
##########
@@ -120,6 +120,19 @@ In order to avoid excessive data skew, the number of 
buffers for each subpartiti
 
 Unlike the input buffer pool, the configured amount of exclusive buffers and 
floating buffers is only treated as recommended values. If there are not enough 
buffers available, Flink can make progress with only a single exclusive buffer 
per output subpartition and zero floating buffers.
 
+#### Overdraft buffers
+
+For each output subtask can also request up to 
`taskmanager.network.memory.max-overdraft-buffers-per-gate` (by default 5) 
extra overdraft buffers.
+Those buffers are only used, if despite presence of a backpressure, Flink can 
not stop producing more records to the output.

Review Comment:
   > These buffers are only used, if the subtask is backpressured by downstream 
subtasks and the subtask still needs to produce more records to the output.
   
   I prefer this, what do you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink] 1996fanrui commented on a diff in pull request #20038: [FLINK-26762][docs] Document overdraft buffers

Reply via email to