[GitHub] [arrow] nealrichardson commented on issue #11781: Is adding Parquet partitions/part files using R's arrow::write_dataset() transactional?

GitBox Mon, 29 Nov 2021 04:52:14 -0800


nealrichardson commented on issue #11781:
URL: https://github.com/apache/arrow/issues/11781#issuecomment-981605639



   The arrow library is not a database, so it doesn't have transactions. If a 
function is in the middle of writing to disk and is interrupted, whatever it 
has already written will be on disk, partial or otherwise. If you wanted to 
make it atomic, you could `write_dataset()` to a `tempfile()` (directory) and 
then move that temp dir to your desired location after it finishes writing 
everything. 
   
   If you wanted to use multiple processes to write to the same directory 
concurrently, you can provide a unique `basename_template` to each 
`write_dataset()` process so that they won't collide. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] nealrichardson commented on issue #11781: Is adding Parquet partitions/part files using R's arrow::write_dataset() transactional?

Reply via email to