devinjdangelo commented on code in PR #9648:
URL: https://github.com/apache/arrow-datafusion/pull/9648#discussion_r1531114965
##########
datafusion/core/src/datasource/file_format/write/mod.rs:
##########
@@ -69,79 +66,6 @@ impl Write for SharedBuffer {
}
}
-/// Stores data needed during abortion of MultiPart writers
Review Comment:
> Is it correct to say that the implications of removing AbortableWrite is
that if certain (larger) writes to object store fail / abort for some reason,
"garbage" (unreferenced partial uploads) may be left around indefinitely on the
provider?
Yes. I have mixed feelings about removing any attempt to clean up on failure.
> I think I could be convinced with an argument like "the software can't
ensure clean up anyways (for example, if it is SIGKILLed) for some reason, and
thus we don't even try to clean up in paths we could", but if we go that route
I think we should explicitly document the behavior and rationale in comments
somewhere
This argument is valid. A hardware/network fault will prevent any cleanup
code we write from working, so to truly protect against partial writes would
require logic outside of DataFusion's process (e.g. on the cloud service
provider side).
On the other hand, this change may be annoying for simple failures when
writing to a local file system. Encountering any execution error will leave
dangling files when before they often could be cleaned up.
I think this is a case where one will draw different conclusions depending
on if they are imagining an individual user of something like `datafusion-cli`
vs. a production database system implemented on top DataFusion. The latter user
will have little use for our attempts at clean up (they will need much better
anyway), but the former may appreciate it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]