SteNicholas commented on code in PR #8062: URL: https://github.com/apache/hudi/pull/8062#discussion_r1211477923
########## rfc/rfc-65/rfc-65.md: ########## @@ -0,0 +1,163 @@ +## Proposers + +- @stream2000 +- @hujincalrin +- @huberylee +- @YuweiXiao + +## Approvers + +## Status + +JIRA: [HUDI-5823](https://issues.apache.org/jira/browse/HUDI-5823) + +## Abstract + +In some classic hudi use cases, users partition hudi data by time and are only interested in data from a recent period +of time. The outdated data is useless and costly, we need a TTL(Time-To-Live) management mechanism to prevent the +dataset from growing infinitely. +This proposal introduces Partition TTL Management policies to hudi, people can config the policies by table config +directly or by call commands. With proper configs set, Hudi can find out which partitions are outdated and delete them. + +## Background + +TTL management mechanism is an important feature for databases. Hudi already provides a delete_partition interface to +delete outdated partitions. However, users still need to detect which partitions are outdated and +call `delete_partition` manually, which means that users need to define and implement some kind of TTL policies and +maintain proper statistics to find expired partitions by themselves. As the scale of installations grew, it's more +important to implement a user-friendly TTL management mechanism for hudi. + +## Implementation Review Comment: Could we introduce the public interfaces before `Implementation` section? Like ttl management service interfaces, ttl execution interface etc? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
