danny0405 commented on code in PR #8062:
URL: https://github.com/apache/hudi/pull/8062#discussion_r1398624021


##########
rfc/rfc-65/rfc-65.md:
##########
@@ -0,0 +1,248 @@
+## Proposers
+
+- @stream2000
+- @hujincalrin
+- @huberylee
+- @YuweiXiao
+
+## Approvers
+
+## Status
+
+JIRA: [HUDI-5823](https://issues.apache.org/jira/browse/HUDI-5823)
+
+## Abstract
+
+In some classic hudi use cases, users partition hudi data by time and are only 
interested in data from a recent period
+of time. The outdated data is useless and costly, we need a lifecycle 
management mechanism to prevent the
+dataset from growing infinitely.
+This proposal introduces partition lifecycle management strategies to hudi, 
people can config the strategies by write
+configs. With proper configs set, Hudi can find out which partitions are 
expired and delete them.
+
+This proposal introduces partition lifecycle management service to hudi. 
Lifecycle management is like other table
+services such as Clean/Compaction/Clustering.
+Users can config their partition lifecycle management strategies through write 
configs and Hudi will help users find
+expired partitions and delete them automatically.
+
+## Background
+
+Lifecycle management mechanism is an important feature for databases. Hudi 
already provides a `delete_partition`
+interface to
+delete outdated partitions. However, users still need to detect which 
partitions are outdated and
+call `delete_partition` manually, which means that users need to define and 
implement some kind of partition lifecycle
+management strategies, find expired partitions and call `delete_partition` by 
themselves. As the scale of installations
+grew, it is becoming increasingly important to implement a user-friendly 
lifecycle management mechanism for hudi.
+
+## Implementation
+
+Our main goals are as follows:
+
+* Providing an extensible framework for partition lifecycle management.
+* Implement a simple KEEP_BY_TIME strategy, which can be executed through 
independent Spark job, synchronous or
+  asynchronous table services.
+
+### Strategy Definition
+
+The lifecycle strategies is similar to existing table service strategies. We 
can define lifecycle strategies like
+defining a clustering/clean/compaction strategy:
+
+```properties
+hoodie.partition.lifecycle.management.strategy=KEEP_BY_TIME
+hoodie.partition.lifecycle.management.strategy.class=org.apache.hudi.table.action.lifecycle.strategy.KeepByTimePartitionLifecycleManagementStrategy

Review Comment:
   `hoodie.partition.lifecycle.management.strategy` -> 
`hoodie.partition.ttl.strategy` ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to