zuston opened a new pull request, #1652:
URL: https://github.com/apache/incubator-uniffle/pull/1652

   ### What changes were proposed in this pull request?
   
   1. make the write client always use the latest assignment for the following 
writing when the block reassign happens.
   2. support reassign multi servers in one time for huge partitions to load 
balance to speed up the writing
   3. support multi time retry for partition reassign
   
   #### Always using the latest assignment
   
   To acheive always using the latest assignment, I introduce the 
`ShuffleHandleInfoWrapper` to get the latest assignment for current task. The 
creating process of AddBlockEvent also will apply the latest assignment by 
`ShuffleHandleInfoWrapper` 
   
   And it will be updated by the `triggerReassignShuffleServer` rpc. 
   That means the original reassign rpc response will be refactored and 
replaced by the whole latest `shuffleHandleInfo`.
   
   #### Load balance for huge partition
   
   Huge partition is recognize by the `NO_BUFFER_FOR_HUGE_PARTITION` status 
code that will triggered the multiple servers reassignment.
   
   And for the different tasks, the concrete huge partition's writing server is 
different which is based the taskAttemptID hash value to get the corresponding 
server from the multiple servers candidates. This will make load balance valid 
for huge partition.
   
   ### Why are the changes needed?
   
   This PR is the subtask for #1608.
   
   Leverging the #1615 / #1610 / #1609, we have reassign servers when write 
client encounters the server failure or unhealthy. But this is not good enough 
that will not share the faulty server state to the unstarted tasks and 
`AddBlockEvent` .
   
   Besides, the huge partition is limited the writing speed to avoid effecting 
other normal partitions without this PR.
   Now, with this PR, we could recognize this case to reassign more servers for 
huge partitions.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes. 
   
   ### How was this patch tested?
   
   Unit and integration tests.
   
   Integration tests as follows:
   1. `PartitionBlockDataReassignBasicTest` to validate the reassign mechanism 
valid
   2. `PartitionBlockDataReassignLoadBalanceTest` is to test the partition 
reassign mechanism of load balance for huge partition
   4. `PartitionBlockDataReassignMultiTimesTest` is to test the partition 
reassign mechanism of multiple retries.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to