[I] [FEATURE] Split one huge event into multi small events to improve HDFS flush performance [incubator-uniffle]

via GitHub Sun, 10 Nov 2024 22:15:37 -0800


zuston opened a new issue, #2242:
URL: https://github.com/apache/incubator-uniffle/issues/2242


   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   
   
   ### Search before asking
   
   - [X] I have searched in the 
[issues](https://github.com/apache/incubator-uniffle/issues?q=is%3Aissue) and 
found no similar issues.
   
   
   ### Describe the feature
   
   In current codebase, for huge partition, before it marked as huge 
partition,it will be reserved in the memory if having enough capacity. But when 
it is marked as huge partition, then it should be flushed into the HDFS, if 
this is specified.
   
   In this first flushing process of this huge partition, it sometimes will be 
large, especially with the huge buffer capacity. And this  will be slow because 
it is a huge flush event, which is not benifited from the concurrency hdfs 
partition writing mechanism.
   And it will occupy memory space before this flush is finished, and will make 
the client backpressure.
   
   From this point, the smaller flush event is better for shuffle-server 
throughout. But the local IO hope the big flush data buffer, which is a trade 
off.
   
   Anyway, the huge partition huge flush event splited into multi small events 
to improve writing performance is useful. 
   
   ### Motivation
   
   _No response_
   
   ### Describe the solution
   
   _No response_
   
   ### Additional context
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] [FEATURE] Split one huge event into multi small events to improve HDFS flush performance [incubator-uniffle]

Reply via email to