Abacn commented on issue #29515: URL: https://github.com/apache/beam/issues/29515#issuecomment-1835019527
The orphaned file does not mean the file left and not in the final destination. FileIO works in two stage to ensure data integrity. It write to a temp location, until all record written (or window fire), the temp file is moved to the final destination. In GCS file system move is achieved by a copy then delete original because GCS blobs is immutable. Orphaned file means the delete wasn't complete, usually due to permission error (if the IAM role of service account used by the worker can create blobs but cannot delete). It's harmless. If there exists file failed to be copied in final destination then the pipeline should fail The problem seen in local and on data flow may be different. But it is still strange why locally can have orphaned file. When running local, do you mean direct runner or flink/other runner? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
