advancedxy opened a new issue, #405: URL: https://github.com/apache/incubator-uniffle/issues/405
### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) ### Search before asking - [X] I have searched in the [issues](https://github.com/apache/incubator-uniffle/issues?q=is%3Aissue) and found no similar issues. ### What would you like to be improved? ### Motivation Due to Uniffle's multi-storage design, it's crucial that the second storages perform well.  Currently, the recommend deploy choices are `MEMORY_LOCALFILE`, `MEMORY_HDFS` and `MEMORY_LOCALFILE_HDFS`, and such deployments are used in production in `iQiyi`, `Didi`, `Tencent`, etc. For `MEMORY_HDFS` deployment, it might be suitable for large shuffle. Due to HDFS' I/O model, it may not handle small shuffles very well. Therefore, `MEMORY_LOCALFILE` and `MEMORY_LOCALFILE_HDFS` are the de facto choices to cover most shuffle scenarios. The storage choice of `LOCALFILE` could be traditional HDD disks, or SSD or higher-speed NVMe SSDs. The SSD storages are preferred. However due to its cost and hardware constraints, SSD cannot be deployed widely. Therefore, I'd like to propose some improvements to Uniffle's HDD deployment, make it more widely applicable. ### How should we improve? 1. Reduce small I/Os of local storages: - The flush strategy of shuffle server could be improved, to wait for more shuffle data before flush to disk. - Leverages range partition to merge multiple partition's shuffle data in one file. - Defer flush of index file, which normally causes a small I/O. 2. I/O capability report, the shuffle worker shall report the local storage's hard drive type: HDD, SSD, etc and the disk numbers. 3. Better balancing strategy in coordinate and shuffle worker side: - coordinate side: partition size/throughput aware assignment to balancing workload in shuffle workers. - shuffle server side: distributes workload evenly to leverages multiple HDD disks. A new disk selection strategy should be deployed and I/O stats aware. cc @zuston ### Are you willing to submit PR? - [X] Yes I am willing to submit a PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@uniffle.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org