jvaesteves opened a new issue #1585:
URL: https://github.com/apache/incubator-hudi/issues/1585


   Hello everyone, I am currently testing Hudi as a deduplication mecanism for 
a streaming project, and it is working pretty good. But as I do not have any 
update to any row, keeping previous versions of the same row is just wasting S3 
space. 
   
   I want know if it is possible to just keep the most recent version of my 
table, or if it is possible to schedule a deletion of this history (and how 
would I do that).
   
   **Environment Description**
   
   - Hudi version: 0.5.2
   - Spark version : 2.4.4
   - Hive version : 2.3.6
   - Hadoop version : 2.8.5
   - Storage (HDFS/S3/GCS..) : S3
   - Running on Docker? (yes/no) : No


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to