Abacn commented on issue #29515:
URL: https://github.com/apache/beam/issues/29515#issuecomment-1835019527

   The orphaned file does not mean the file left and not in the final 
destination.
   
   FileIO works in two stage to ensure data integrity. It write to a temp 
location, until all record written (or window fire), the temp file is moved to 
the final destination.
   
   In GCS file system move is achieved by a copy then delete original because 
GCS blobs is immutable. Orphaned file means the delete wasn't complete, usually 
due to permission error (if the IAM role of service account used by the worker 
can create blobs but cannot delete). It's harmless. If there exists file failed 
to be copied in final destination then the pipeline should fail
   
   The problem seen in local and on data flow may be different. But it is still 
strange why locally can have orphaned file. When running local, do you mean 
direct runner or flink/other runner?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to