[ 
https://issues.apache.org/jira/browse/HUDI-4216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-4216:
--------------------------------------
    Priority: Blocker  (was: Major)

> Add support for infinite retention of data files with archival enabled 
> -----------------------------------------------------------------------
>
>                 Key: HUDI-4216
>                 URL: https://issues.apache.org/jira/browse/HUDI-4216
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: archiving
>            Reporter: sivabalan narayanan
>            Priority: Blocker
>
> We can support infinite retention with hudi (with archival enabled), it would 
> be a pretty good use-case for those who may want to query hudi table for any 
> time in the past. 
>  
> How to achieve: 
> - Disable cleaner completely. 
> - Enable archival as usual. 
> - Enable metadata table and so file listing can scale well. 
> Let users query hudi with "as.of.timestamp" with any timestamp in the past. 
>  
> With this, we can let users to retain all data for 1 year or even more and 
> still query for any snapshot in the past. Obviously this comes with the 
> additional storage cost, but if users are willing to bear the cost, we should 
> be able to support them. 
>  
> Disabling cleaner : 
>   option("hoodie.clean.automatic","false").
>   option("hoodie.clean.async","true").
>  
> Things to fix:
> Replaced file groups, once removed the archiver, could become active file 
> groups. For eg, if clustering replaced FG_1 and FG2, 
> HoodieTableFileSystemView will load all file groups and then will filter out 
> replaced file groups. FG_1 and FG_2 will be deduced as replaced if it finds a 
> replace commit pertaining to commits for FG_1 and FG_2 in active timeline. 
> In regular flow, cleaner will clean those file groups and the timeline files 
> may not matter after that. but here, since cleaner is completely disabled, we 
> need to fix this. 
>  
>  
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to