[ https://issues.apache.org/jira/browse/KAFKA-631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jay Kreps updated KAFKA-631: ---------------------------- Attachment: KAFKA-631-v2.patch New patch, only minor changes: 1. Rebased against trunk at 9ee795ac563c3ce4c4f03e022c7f951e065ad1ed 2. Implemented seeding for the offset map hash so that now a different hash is used on each iteration so collisions between cleaning iterations should be independent. 3. Implemented batching in the cleaner's writes. This improves the per-thread performance from about 11MB/sec to about 64MB/sec on my laptop. 4. Add a special log4j log for cleaner messages since they are kind of verbose. > Implement log compaction > ------------------------ > > Key: KAFKA-631 > URL: https://issues.apache.org/jira/browse/KAFKA-631 > Project: Kafka > Issue Type: New Feature > Components: core > Affects Versions: 0.8.1 > Reporter: Jay Kreps > Assignee: Jay Kreps > Attachments: KAFKA-631-v1.patch, KAFKA-631-v2.patch > > > Currently Kafka has only one way to bound the space of the log, namely by > deleting old segments. The policy that controls which segments are deleted > can be configured based either on the number of bytes to retain or the age of > the messages. This makes sense for event or log data which has no notion of > primary key. However lots of data has a primary key and consists of updates > by primary key. For this data it would be nice to be able to ensure that the > log contained at least the last version of every key. > As an example, say that the Kafka topic contains a sequence of User Account > messages, each capturing the current state of a given user account. Rather > than simply discarding old segments, since the set of user accounts is > finite, it might make more sense to delete individual records that have been > made obsolete by a more recent update for the same key. This would ensure > that the topic contained at least the current state of each record. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira