jtao15 opened a new pull request, #8838:
URL: https://github.com/apache/pinot/pull/8838
This pr should solve the followings:
1. It's possible to violate the 2X storage usage because the segment cleanup
in startReplaceSegment is async. E.g. a refresh table has 2 snapshots already.
The the 3rd push got success response from startReplaceSegment api and starts
to upload, but the cleanup for the 1st snapshot fails.
2. In case of rerunning the failed push, there's a race condition between
retention manager/revertReplace and startReplace.
```
For example, the retention manager can delete the new uploaded s3 and s4
because the delete is async in retention manager.
Initial state:
lineageEntry1: { segmentsFrom: [s1, s2], segmentsTo: [s3, s4], status:
REVERTED/IN_PROGRESS > 24h}
Running startReplace:
lineageEntry1: { segmentsFrom: [s1, s2], segmentsTo: [], status: REVERTED}
lineageEntry2: { segmentsFrom: [s1, s2], segmentsTo: [s3, s4], status:
IN_PROGRESS}
```
This pr should solve the race condition by waiting for the clean up for s3
and s4 before completing startReplace so the client side hasn't upload s3 and
s4 yet.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]