[GitHub] [flink] wsry opened a new pull request #18505: [FLINK-25796][network] Avoid record copy for result partition of sort-shuffle if there are enough buffers for better performance

GitBox Tue, 25 Jan 2022 05:04:17 -0800


wsry opened a new pull request #18505:
URL: https://github.com/apache/flink/pull/18505



   ## What is the purpose of the change
   
   Currently, for result partition of sort-shuffle, there is extra record copy 
overhead Introduced by clustering records by subpartition index. For small 
records, this overhead can cause even 20% performance regression. This ticket 
aims to solve the problem.
   
   In fact, the hash-based implementation is a nature way to achieve the goal 
of sorting records by partition index. However, it incurs some serious 
weaknesses. For example, when there is no enough buffers or there is data skew, 
it can waste buffers and influence compression efficiency which can cause 
performance regression.
   
   This ticket tries to solve the issue by dynamically switching between the 
two implementations, that is, if there are enough buffers, the hash-based 
implementation will be used and if there is no enough buffers, the sort-based 
implementation will be used.
   
   ## Brief change log
   
     - Dynamically switching between the two implementations, that is, if there 
are enough buffers, the hash-based implementation will be used and if there is 
no enough buffers, the sort-based implementation will be used.
   
   
   ## Verifying this change
   
   This change added tests and existing tests can also help to verify the 
change.
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): (yes / **no**)
     - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / **no**)
     - The serializers: (yes / **no** / don't know)
     - The runtime per-record code paths (performance sensitive): (yes / **no** 
/ don't know)
     - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / **no** / don't 
know)
     - The S3 file system connector: (yes / **no** / don't know)
   
   ## Documentation
   
     - Does this pull request introduce a new feature? (yes / **no**)
     - If yes, how is the feature documented? (**not applicable** / docs / 
JavaDocs / not documented)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink] wsry opened a new pull request #18505: [FLINK-25796][network] Avoid record copy for result partition of sort-shuffle if there are enough buffers for better performance

Reply via email to