[GitHub] [incubator-uniffle] advancedxy opened a new issue, #405: [Improvement] HDD friendly deployment

GitBox Mon, 12 Dec 2022 18:56:13 -0800


advancedxy opened a new issue, #405:
URL: https://github.com/apache/incubator-uniffle/issues/405

### Code of Conduct

- [X] I agree to follow this project's [Code of
Conduct](https://www.apache.org/foundation/policies/conduct)

### Search before asking

- [X] I have searched in the
[issues](https://github.com/apache/incubator-uniffle/issues?q=is%3Aissue) and
found no similar issues.

### What would you like to be improved?

### Motivation
Due to Uniffle's multi-storage design, it's crucial that the second storages
perform well.
![RSS
Arch](https://github.com/apache/incubator-uniffle/blob/master/docs/asset/rss_architecture.png?raw=true)

Currently, the recommend deploy choices are `MEMORY_LOCALFILE`,
`MEMORY_HDFS` and `MEMORY_LOCALFILE_HDFS`, and such deployments are used in
production in `iQiyi`, `Didi`, `Tencent`, etc.

For `MEMORY_HDFS` deployment, it might be suitable for large shuffle. Due to
HDFS' I/O model, it may not handle small shuffles very well. Therefore,
`MEMORY_LOCALFILE` and `MEMORY_LOCALFILE_HDFS` are the de facto choices to
cover most shuffle scenarios.

The storage choice of `LOCALFILE` could be traditional HDD disks, or SSD or
higher-speed NVMe SSDs. The SSD storages are preferred. However due to its
cost and hardware constraints, SSD cannot be deployed widely.

Therefore, I'd like to propose some improvements to Uniffle's HDD
deployment, make it more widely applicable.

### How should we improve?

1. Reduce small I/Os of local storages:
- The flush strategy of shuffle server could be improved, to wait for
more shuffle data before flush to disk.
- Leverages range partition to merge multiple partition's shuffle data
in one file.
- Defer flush of index file, which normally causes a small I/O.
2. I/O capability report, the shuffle worker shall report the local
storage's hard drive type: HDD, SSD, etc and the disk numbers.
3. Better balancing strategy in coordinate and shuffle worker side:
- coordinate side: partition size/throughput aware assignment to
balancing workload in shuffle workers.
- shuffle server side: distributes workload evenly to leverages multiple
HDD disks. A new disk selection strategy should be deployed and I/O stats
aware. cc @zuston

### Are you willing to submit PR?

- [X] Yes I am willing to submit a PR!

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@uniffle.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [incubator-uniffle] advancedxy opened a new issue, #405: [Improvement] HDD friendly deployment

Reply via email to