jtao15 opened a new pull request, #8838:
URL: https://github.com/apache/pinot/pull/8838

   This pr should solve the followings:
   1. It's possible to violate the 2X storage usage because the segment cleanup 
in startReplaceSegment is async. E.g. a refresh table has 2 snapshots already. 
The the 3rd push got success response from startReplaceSegment api and starts 
to upload, but the cleanup for the 1st snapshot fails.
   2. In case of rerunning the failed push, there's a race condition between 
retention manager/revertReplace and startReplace. 
   ```
   For example, the retention manager can delete the new uploaded s3 and s4 
because the delete is async in retention manager. 
   Initial state:
   lineageEntry1: { segmentsFrom: [s1, s2], segmentsTo: [s3, s4], status: 
REVERTED/IN_PROGRESS > 24h}
   
   Running startReplace:
   lineageEntry1: { segmentsFrom: [s1, s2], segmentsTo: [], status: REVERTED}
   lineageEntry2: { segmentsFrom: [s1, s2], segmentsTo: [s3, s4], status: 
IN_PROGRESS}
   ```
   This pr should solve the race condition by waiting for the clean up for s3 
and s4 before completing startReplace so the client side hasn't upload s3 and 
s4 yet.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to