devinjdangelo commented on code in PR #9648:
URL: https://github.com/apache/arrow-datafusion/pull/9648#discussion_r1531114965


##########
datafusion/core/src/datasource/file_format/write/mod.rs:
##########
@@ -69,79 +66,6 @@ impl Write for SharedBuffer {
     }
 }
 
-/// Stores data needed during abortion of MultiPart writers

Review Comment:
   I have mixed feelings about removing any attempt to clean up on failure.
   
   > I think I could be convinced with an argument like "the software can't 
ensure clean up anyways (for example, if it is SIGKILLed) for some reason, and 
thus we don't even try to clean up in paths we could", but if we go that route 
I think we should explicitly document the behavior and rationale in comments 
somewhere
   
   This argument is valid. A hardware/network fault will prevent any cleanup 
code we write from working, so to truly protect against partial writes would 
require logic outside of DataFusion's process (e.g. on the cloud service 
provider side).
   
   On the other hand, this change may be annoying for simple failures when 
writing to a local file system. Encountering any execution error will leave 
dangling files when before they often could be cleaned up. 
   
   I think this may be case where one will draw different conclusions depending 
on if they are imagining an individual user of something like `datafusion-cli` 
vs. a production database system implemented on top DataFusion. The latter user 
will have little use for our attempts at clean up (they will need much better 
anyway), but the former may appreciate it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to