sahibamatta commented on PR #1866:
URL: 
https://github.com/apache/incubator-uniffle/pull/1866#issuecomment-2240435274

   > I think you haven't fully understood this issue. A flush event corresponds 
to a specific `appId/shuffleId/partitionId`. If the event for this 
`appId/shuffleId/partitionId` fails, then when committing the shuffle for this 
specific shuffleId, it should fail too. Instead of using a global boolean value 
`writeError` in `ShuffleFlushManager` to determine this, which would cause the 
`commitShuffle` that does not belong to this `appId/shuffleId/partitionId` to 
fail as well. We don't want this to happen.
   > 
   > For this issue, you'd better conduct some tests.
   
   Apologies, for my little context and understanding of the issue, being new 
to the repo, it's taking sometime to get a hang of things. As mentioned by you, 
that a flushEvent and commitShuffle is shuffle specific, so instead of a single 
global variable, created a global list: `shuffleIdsWithWriteError`, which will 
contain the shuffleIds which had erroneous write. During the `commitShuffle`, 
when `getCommittedBlockIds` gets called, this list gets checked for the 
provided shuffleId, and if found it throws: `EventDiscardException`. 
   
   Added a UT too for it. Let me know if it makes sense to you. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to