[GitHub] [hudi] Rap70r opened a new issue #4876: Atomic overwrite of multiple files

GitBox Tue, 22 Feb 2022 18:08:13 -0800


Rap70r opened a new issue #4876:
URL: https://github.com/apache/hudi/issues/4876



   Hello,
   
   We have a use case where we need to refresh a table in parquet format on S3 
environment periodically. The data is too large to create a single parquet file 
each time so we are using dataframe repartition to generate the output. We need 
to be able to overwrite the entire dataset atomically. Using Spark alone, it 
removed the files and then it creates partitions but is not atomic, as the 
files are generated with eventual consistency manner.
   Is it possible to use Hudi to overwrite the entire dataset in parquet format 
atomically?
   
   Thank you
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] Rap70r opened a new issue #4876: Atomic overwrite of multiple files

Reply via email to