StefanRRichter opened a new pull request, #23402:
URL: https://github.com/apache/flink/pull/23402

   ## What is the purpose of the change
   
   Problem:
   Buffer debloating sets buffer size to 256 bytes because of back-pressure.
   Such small buffers might not be enough to emit the processing results of a 
single record. The task thread would request new buffers, and often block.
   That results in significant checkpoint delays (up to minutes instead of 
seconds).
   
   Adding more overdraft buffers helps, but depends on the job DoP
   Raising taskmanager.memory.min-segment-size from 256 helps, but depends on 
the multiplication factor of the operator.
   
   
   ## Brief change log
   
   Solution:
   
   Ignore Buffer Debloater hints and extend the buffer if possible - when this 
prevents emitting an output record fully AND this is the last available buffer.
   Prevent the subsequent flush of the buffer so that more output records can 
be emitted (flatMap-like and join operators)
   
   All code originally written by @rkhachatryan .
   
   ## Verifying this change
   
   
   This change added tests and can be verified as follows:
   
   Tested locally with FLIP-27 sources (Kafka).
   Added unit test ResultPartitionTest.testEmitRecordExpandsLastBuffer.
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): (no)
     - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
     - The serializers: (no)
     - The runtime per-record code paths (performance sensitive): (no)
     - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes)
     - The S3 file system connector: (no)
   
   ## Documentation
   
     - Does this pull request introduce a new feature? (no)
     - If yes, how is the feature documented? (not applicable)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to