jvaesteves opened a new issue #1585: URL: https://github.com/apache/incubator-hudi/issues/1585
Hello everyone, I am currently testing Hudi as a deduplication mecanism for a streaming project, and it is working pretty good. But as I do not have any update to any row, keeping previous versions of the same row is just wasting S3 space. I want know if it is possible to just keep the most recent version of my table, or if it is possible to schedule a deletion of this history (and how would I do that). **Environment Description** - Hudi version: 0.5.2 - Spark version : 2.4.4 - Hive version : 2.3.6 - Hadoop version : 2.8.5 - Storage (HDFS/S3/GCS..) : S3 - Running on Docker? (yes/no) : No ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
