efung opened a new issue, #27926: URL: https://github.com/apache/beam/issues/27926
### What happened? I am trying to write a status value to a file via `beam.io.WriteToText`. If the input PCollection is empty, I don't want the file to be overwritten. I've set the argument `skip_if_empty=True` but the file gets deleted. I initially encountered this bug when writing to a file in GCS, but have also reproduced using a file on my local computer. I'm using macOS 13.5, Python 3.9.10, Apache Beam 2.46.0 # Steps to reproduce 1. Run the attached Python script, [skip_if_empty.txt](https://github.com/apache/beam/files/12302815/skip_if_empty.txt), like this: `python skip_if_empty.txt --output-file test.txt --project <some_gcp_project>` 2. Note that a timestamp value is written into `test.txt` 3. Now, edit the script and comment out the string in the list passed to `beam.Create`, so that the collection is empty. 4. Run the script again as above. 5. Observe these warnings printed to the console: ``` WARNING:apache_beam.io.filebasedsink:Deleting 1 existing files in target path matching: WARNING:apache_beam.io.filebasedsink:No shards found to finalize. num_shards: 0, skipped: 0 ``` 6. Observe that `test.txt` has now been deleted. 7. Repeat the above using `gs://some_gcp_project/path/to/test.txt` as the output file (if you have access to a GCP project and GCS) ### Issue Priority Priority: 2 (default / most bugs should be filed as P2) ### Issue Components - [X] Component: Python SDK - [ ] Component: Java SDK - [ ] Component: Go SDK - [ ] Component: Typescript SDK - [X] Component: IO connector - [ ] Component: Beam examples - [ ] Component: Beam playground - [ ] Component: Beam katas - [ ] Component: Website - [ ] Component: Spark Runner - [ ] Component: Flink Runner - [ ] Component: Samza Runner - [ ] Component: Twister2 Runner - [ ] Component: Hazelcast Jet Runner - [ ] Component: Google Cloud Dataflow Runner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
