ahmedabu98 opened a new pull request, #22138: URL: https://github.com/apache/beam/pull/22138
There is an edge case when Beam issues multiple rewrites to a GCS bucket. A bundle that rewrites files can succeed, but the fact of its success may not persist in Beam (e.g. autoscaling workers, worker restart). On bundle retry, the rewrite is issued again with the same destination. If this happens when writing to a bucket with a retention policy, we will run into the retentionPolicyNotMet error. PR adds support for this case by handling the retentionPolicyNotMet error as such: - If source and destination files have the same checksum, assume they are identical and skip the rewrite - otherwise, throw an early error More information in this design document: https://docs.google.com/document/d/11kXzI90KmAyknszSFmtfPcL_GWaVzpt8MojQfifZOoM/edit#heading=h.f8k9hty17wf5 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
