Jay Kreps created KAFKA-631:
-------------------------------

             Summary: Implement log compaction
                 Key: KAFKA-631
                 URL: https://issues.apache.org/jira/browse/KAFKA-631
             Project: Kafka
          Issue Type: New Feature
          Components: core
    Affects Versions: 0.8.1
            Reporter: Jay Kreps


Currently Kafka has only one way to bound the space of the log, namely by 
deleting old segments. The policy that controls which segments are deleted can 
be configured based either on the number of bytes to retain or the age of the 
messages. This makes sense for event or log data which has no notion of primary 
key. However lots of data has a primary key and consists of updates by primary 
key. For this data it would be nice to be able to ensure that the log contained 
at least the last version of every key.

As an example, say that the Kafka topic contains a sequence of User Account 
messages, each capturing the current state of a given user account. Rather than 
simply discarding old segments, since the set of user accounts is finite, it 
might make more sense to delete individual records that have been made obsolete 
by a more recent update for the same key. This would ensure that the topic 
contained at least the current state of each record.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to