[
https://issues.apache.org/jira/browse/HUDI-4216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sivabalan narayanan updated HUDI-4216:
--------------------------------------
Priority: Blocker (was: Major)
> Add support for infinite retention of data files with archival enabled
> -----------------------------------------------------------------------
>
> Key: HUDI-4216
> URL: https://issues.apache.org/jira/browse/HUDI-4216
> Project: Apache Hudi
> Issue Type: Improvement
> Components: archiving
> Reporter: sivabalan narayanan
> Priority: Blocker
>
> We can support infinite retention with hudi (with archival enabled), it would
> be a pretty good use-case for those who may want to query hudi table for any
> time in the past.
>
> How to achieve:
> - Disable cleaner completely.
> - Enable archival as usual.
> - Enable metadata table and so file listing can scale well.
> Let users query hudi with "as.of.timestamp" with any timestamp in the past.
>
> With this, we can let users to retain all data for 1 year or even more and
> still query for any snapshot in the past. Obviously this comes with the
> additional storage cost, but if users are willing to bear the cost, we should
> be able to support them.
>
> Disabling cleaner :
> option("hoodie.clean.automatic","false").
> option("hoodie.clean.async","true").
>
> Things to fix:
> Replaced file groups, once removed the archiver, could become active file
> groups. For eg, if clustering replaced FG_1 and FG2,
> HoodieTableFileSystemView will load all file groups and then will filter out
> replaced file groups. FG_1 and FG_2 will be deduced as replaced if it finds a
> replace commit pertaining to commits for FG_1 and FG_2 in active timeline.
> In regular flow, cleaner will clean those file groups and the timeline files
> may not matter after that. but here, since cleaner is completely disabled, we
> need to fix this.
>
>
>
>
>
>
>
>
--
This message was sent by Atlassian Jira
(v8.20.7#820007)