EnricoMi opened a new issue, #1621: URL: https://github.com/apache/incubator-uniffle/issues/1621
### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) ### Search before asking - [X] I have searched in the [issues](https://github.com/apache/incubator-uniffle/issues?q=is%3Aissue) and found no similar issues. ### What would you like to be improved? The block id encodes the task attempt id, the partition id, and the sequence number of a shuffle block. Block data are stored along with this block id, while some of these information are explicitly stored as well (task attempt id). For reading, interfaces use block id, partition id encoded in block id, and explicit task attempt ids to identify relevant blocks. Reading blocks explicitly by their block ids implicitly reads particular partition id, task attempt id and sequence number tuples. Further, the block id is 63 bits long, while task attempt id and partition id are 31 bits long. This poses a conflict as bits have to be balanced between task attempt ids, partition ids and sequence number. There are potential situations where one of these bits are insufficient, which kills an application using the shuffle service. The block id should be refactored to simplify interfaces, to make reading blocks explicit, and to avoid potential situations where bits are insufficient. ### How should we improve? The task attempt id should be moved out of the block id, as blocks are already stored with this information (#420). The same should happen for the partition id. This effectively turns the block id into the block sequence number. It could be renamed or reduced in bit size. A task attempt can register blocks using a `Map<int, int>` or `List<Tuple2<int, int>>`, which maps a partition id to the number of blocks. This should also reduce the memory footprint storing registered block ids on the shuffle server (or the Spark driver #1538). Reading blocks should be as easy as reading blocks that match `Map<int, Map<int, int>>`, which maps a partition id to the task attempt ids and their number of blocks. While reading those blocks, the reader has to maintain a bitset of block numbers from `0` to `number of blocks - 1` (#1399) for each partition id and task attempt id to identify duplicates. This can be done via `Map<int, Roaring64NavigableMap>`, where the `Roaring64NavigableMap` stores read (task attempt id, sequence number) tuples. ### Are you willing to submit PR? - [X] Yes I am willing to submit a PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
