EnricoMi opened a new issue, #1621:
URL: https://github.com/apache/incubator-uniffle/issues/1621

   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   
   
   ### Search before asking
   
   - [X] I have searched in the 
[issues](https://github.com/apache/incubator-uniffle/issues?q=is%3Aissue) and 
found no similar issues.
   
   
   ### What would you like to be improved?
   
   The block id encodes the task attempt id, the partition id, and the sequence 
number of a shuffle block. Block data are stored along with this block id, 
while some of these information are explicitly stored as well (task attempt 
id). For reading, interfaces use block id, partition id encoded in block id, 
and explicit task attempt ids to identify relevant blocks. Reading blocks 
explicitly by their block ids implicitly reads particular partition id, task 
attempt id and sequence number tuples.
   
   Further, the block id is 63 bits long, while task attempt id and partition 
id are 31 bits long. This poses a conflict as bits have to be balanced between 
task attempt ids, partition ids and sequence number. There are potential 
situations where one of these bits are insufficient, which kills an application 
using the shuffle service.
   
   The block id should be refactored to simplify interfaces, to make reading 
blocks explicit, and to avoid potential situations where bits are insufficient.
   
   ### How should we improve?
   
   The task attempt id should be moved out of the block id, as blocks are 
already stored with this information (#420). The same should happen for the 
partition id. This effectively turns the block id into the block sequence 
number. It could be renamed or reduced in bit size.
   
   A task attempt can register blocks using a `Map<int, int>` or 
`List<Tuple2<int, int>>`, which maps a partition id to the number of blocks. 
This should also reduce the memory footprint storing registered block ids on 
the shuffle server (or the Spark driver #1538).
   
   Reading blocks should be as easy as reading blocks that match `Map<int, 
Map<int, int>>`, which maps a partition id to the task attempt ids and their 
number of blocks. While reading those blocks, the reader has to maintain a 
bitset of block numbers from `0` to `number of blocks - 1` (#1399) for each 
partition id and task attempt id to identify duplicates. This can be done via 
`Map<int, Roaring64NavigableMap>`, where the `Roaring64NavigableMap` stores 
read (task attempt id, sequence number) tuples.
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to