orf opened a new issue, #40502:
URL: https://github.com/apache/arrow/issues/40502
### Describe the enhancement requested
The following code creates an object in S3, even if the exception is raised:
```python
from pyarrow.fs import S3FileSystem
fs = S3FileSystem()
with fs.open_output_stream("bucket/key") as fd:
fd.write(b"hi")
raise RuntimeError()
```
This is usually what you want (I guess?), but there is no way to change this
behaviour. If you have a long-running task that writes to a `NativeFile`, you
may not want partially written data to be written to S3 in the case of an
error. The workaround is annoying - you need to write to a separate key, then
move the file (or delete it):
```
fs = S3FileSystem()
try:
with fs.open_output_stream("bucket/another_key") as fd:
fd.write(b"hi")
except Exception:
fs.delete("bucket/another_key")
else:
fs.move("bucket/another_key", "bucket/actual_key")
```
This works, but it's less performant than it should be: data is essentially
written to S3 twice, and S3 natively provides a way to prevent this issue: just
abort the MultipartUpload (or, don't complete it).
I'd love a way to do something like this:
```
fs = S3FileSystem()
with fs.open_output_stream("bucket/key") as fd:
fd.write(b"hi")
fd.abort()
# The object should now not exist.
```
I've tried using `close()` to no avail, and the `background_writes`
parameter on the S3FileSystem() seems to have no effect.
### Component(s)
Python
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]