Rap70r opened a new issue #4876: URL: https://github.com/apache/hudi/issues/4876
Hello, We have a use case where we need to refresh a table in parquet format on S3 environment periodically. The data is too large to create a single parquet file each time so we are using dataframe repartition to generate the output. We need to be able to overwrite the entire dataset atomically. Using Spark alone, it removed the files and then it creates partitions but is not atomic, as the files are generated with eventual consistency manner. Is it possible to use Hudi to overwrite the entire dataset in parquet format atomically? Thank you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
