David created BEAM-13010:
----------------------------
Summary: Delete orphaned files
Key: BEAM-13010
URL: https://issues.apache.org/jira/browse/BEAM-13010
Project: Beam
Issue Type: Bug
Components: io-py-files
Affects Versions: 2.34.0
Reporter: David
Fix For: 2.35.0
Until version 2.33.0 of Apache Beam, (tested with a Python streaming pipeline
consuming events from PubSub and writing them into GCS), some files were being
deleted from the temporary folder before being moved to the destination. This
was the original issue:
https://issues.apache.org/jira/browse/BEAM-12950
In version 2.34.0 we applied a temporary workaround to be sure that no data is
dropped. Instead of deleting the orphaned files, we just log them:
[https://github.com/apache/beam/pull/15576]
Most probably the root cause of the missing event was that we were removing
files at an erroneous time. We need to delete orphaned files in a subsequent
step (after we're sure that there won't be retries).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)