ahmedabu98 opened a new pull request, #22138:
URL: https://github.com/apache/beam/pull/22138

   There is an edge case when Beam issues multiple rewrites to a GCS bucket. A 
bundle that rewrites files can succeed, but the fact of its success may not 
persist in Beam (e.g. autoscaling workers, worker restart). On bundle retry, 
the rewrite is issued again with the same destination. If this happens when 
writing to a bucket with a retention policy, we will run into the 
retentionPolicyNotMet error.
   
   PR adds support for this case by handling the retentionPolicyNotMet error as 
such:
   - If source and destination files have the same checksum, assume they are 
identical and skip the rewrite
   - otherwise, throw an early error
   
   More information in this design document: 
https://docs.google.com/document/d/11kXzI90KmAyknszSFmtfPcL_GWaVzpt8MojQfifZOoM/edit#heading=h.f8k9hty17wf5


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to