orf opened a new issue, #40502:
URL: https://github.com/apache/arrow/issues/40502

   ### Describe the enhancement requested
   
   The following code creates an object in S3, even if the exception is raised:
   
   ```python
   from pyarrow.fs import S3FileSystem
   
   fs = S3FileSystem()
   with fs.open_output_stream("bucket/key") as fd:
       fd.write(b"hi")
       raise RuntimeError()
   ```
   
   This is usually what you want (I guess?), but there is no way to change this 
behaviour. If you have a long-running task that writes to a `NativeFile`, you 
may not want partially written data to be written to S3 in the case of an 
error. The workaround is annoying - you need to write to a separate key, then 
move the file (or delete it):
   
   ```
   fs = S3FileSystem()
   try:
       with fs.open_output_stream("bucket/another_key") as fd:
           fd.write(b"hi")
   except Exception:
       fs.delete("bucket/another_key")
   else:
       fs.move("bucket/another_key", "bucket/actual_key")
   ```
   
   This works, but it's less performant than it should be: data is essentially 
written to S3 twice, and S3 natively provides a way to prevent this issue: just 
abort the MultipartUpload (or, don't complete it).
   
   I'd love a way to do something like this:
   
   ```
   fs = S3FileSystem()
   with fs.open_output_stream("bucket/key") as fd:
       fd.write(b"hi")
       fd.abort()
   # The object should now not exist.
   ```
   
   I've tried using `close()` to no avail, and the `background_writes` 
parameter on the S3FileSystem() seems to have no effect.
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to