[ https://issues.apache.org/jira/browse/KAFKA-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15548776#comment-15548776 ]
ASF GitHub Bot commented on KAFKA-3224: --------------------------------------- GitHub user bill-warshaw opened a pull request: https://github.com/apache/kafka/pull/1972 KAFKA-3224: New log deletion policy based on timestamp * adds a new topic-level broker configuration, `log.retention.min.timestamp` * if unset, this setting is ignored * setting this value to a Unix timestamp will allow the log cleaner to delete any segments for a given topic whose last timestamp is earlier than the set timestamp -- ### [KIP-47](https://cwiki.apache.org/confluence/display/KAFKA/KIP-47+-+Add+timestamp-based+log+deletion+policy) ### [JIRA](https://issues.apache.org/jira/browse/KAFKA-3224) You can merge this pull request into a Git repository by running: $ git pull https://github.com/bill-warshaw/kafka KAFKA-3224 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/1972.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1972 ---- commit 008dc813d82e5938201f069712ba1bd44277e755 Author: Bill Warshaw <bill.wars...@appian.com> Date: 2016-02-02T16:45:47Z KAFKA-3224: New log deletion policy based on timestamp * setting log.retention.min.timestamp will set a timestamp for a log, and any message before that timestamp is eligible for deletion ---- > Add timestamp-based log deletion policy > --------------------------------------- > > Key: KAFKA-3224 > URL: https://issues.apache.org/jira/browse/KAFKA-3224 > Project: Kafka > Issue Type: Improvement > Reporter: Bill Warshaw > Labels: kafka > > One of Kafka's officially-described use cases is a distributed commit log > (http://kafka.apache.org/documentation.html#uses_commitlog). In this case, > for a distributed service that needed a commit log, there would be a topic > with a single partition to guarantee log order. This service would use the > commit log to re-sync failed nodes. Kafka is generally an excellent fit for > such a system, but it does not expose an adequate mechanism for log cleanup > in such a case. With a distributed commit log, data can only be deleted when > the client application determines that it is no longer needed; this creates > completely arbitrary ranges of time and size for messages, which the existing > cleanup mechanisms can't handle smoothly. > A new deletion policy based on the absolute timestamp of a message would work > perfectly for this case. The client application will periodically update the > minimum timestamp of messages to retain, and Kafka will delete all messages > earlier than that timestamp using the existing log cleaner thread mechanism. > This is based off of the work being done in KIP-32 - Add timestamps to Kafka > message. > h3. Initial Approach > https://github.com/apache/kafka/compare/trunk...bill-warshaw:KAFKA-3224 -- This message was sent by Atlassian JIRA (v6.3.4#6332)