Re: [PR] [#1512] feature(Spark): Replace taskAttemptId with mapIndex in blockId [incubator-uniffle]

via GitHub Tue, 13 Feb 2024 18:28:12 -0800


jerqi commented on PR #1514:
URL: 
https://github.com/apache/incubator-uniffle/pull/1514#issuecomment-1942991425


   > @jerqi you are right, if both tasks report their shuffle results (block 
ids), we would not know how many blocks to read for our partition id and map 
index. For that to work we need to guarantee that only one task attempt 
succeeds in registering shuffle results (blockids). We then take that 
taskAttemptId and those blockIds for the partition id and map index.
   > 
   > This would require a two-phase commit to guard / coordinate the report 
shuffle result step, which would be a much bigger change: 
https://github.com/G-Research/incubator-uniffle/pull/4/files
   > 
   > There are only two situations, where this can happen, though: speculative 
task execution (which is not recommended for production) and (without 
speculative task execution) a task failure after shuffle results have been 
reported, which triggers another task attempt.
   
   Spark has already had a commit mechanism. If we set up a new two-phrase 
commit. Maybe there will be some consistent. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [#1512] feature(Spark): Replace taskAttemptId with mapIndex in blockId [incubator-uniffle]

Reply via email to