advancedxy opened a new issue, #405:
URL: https://github.com/apache/incubator-uniffle/issues/405

   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   
   
   ### Search before asking
   
   - [X] I have searched in the 
[issues](https://github.com/apache/incubator-uniffle/issues?q=is%3Aissue) and 
found no similar issues.
   
   
   ### What would you like to be improved?
   
   ### Motivation
   Due to Uniffle's multi-storage design, it's crucial that the second storages 
perform well.
   ![RSS 
Arch](https://github.com/apache/incubator-uniffle/blob/master/docs/asset/rss_architecture.png?raw=true)
   
   Currently, the recommend deploy choices are `MEMORY_LOCALFILE`, 
`MEMORY_HDFS` and `MEMORY_LOCALFILE_HDFS`, and such deployments are used in 
production in `iQiyi`, `Didi`, `Tencent`, etc. 
   
   For `MEMORY_HDFS` deployment, it might be suitable for large shuffle. Due to 
HDFS' I/O model, it may not handle small shuffles very well. Therefore, 
`MEMORY_LOCALFILE` and `MEMORY_LOCALFILE_HDFS` are the de facto choices to 
cover most shuffle scenarios.
   
   The storage choice of `LOCALFILE` could be traditional HDD disks, or SSD or 
higher-speed NVMe SSDs.  The SSD storages are preferred. However due to its 
cost and hardware constraints, SSD cannot be deployed widely.
   
   Therefore, I'd like to propose some improvements to Uniffle's HDD 
deployment, make it more widely applicable.
   
   ### How should we improve?
   
   1. Reduce small I/Os of local storages:
       - The flush strategy of shuffle server could be improved, to wait for 
more shuffle data before flush to disk.
       - Leverages range partition to merge multiple partition's shuffle data 
in one file.
       - Defer flush of index file, which normally causes a small I/O.
   2. I/O capability report, the shuffle worker shall report the local 
storage's hard drive type: HDD, SSD, etc and the disk numbers.
   3. Better balancing strategy in coordinate and shuffle worker side:
       - coordinate side: partition size/throughput aware assignment to 
balancing workload in shuffle workers.
       - shuffle server side: distributes workload evenly to leverages multiple 
HDD disks. A new disk selection strategy should be deployed and I/O stats 
aware. cc @zuston 
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@uniffle.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to