Hi Philip, Yes, the offset idea is critical--offsets remain in 0.8. Offset is still both the primary key and also the logical idea of time in the log, it is just that now the translation between offset and file position are more complicated and the offset numbers are always 0, 1, 2, ....
The design document is here: https://cwiki.apache.org/confluence/display/KAFKA/Keyed+Messages+Proposal The JIRA has a number of technical details: https://issues.apache.org/jira/browse/KAFKA-506 There was some discussion on this on the mailing list a while back. Here was the explanation of benefits I gave then: *Here is what this adds right off the bat with the patch I checked in:* *a. It is esthetically nice. The first message will have offset 0, the second message 1, the 100th message offset 99, etc.* *b. You can read the messages in reverse if you like. If the end of the log is 9876 then 100 messages before that is 9776.* *c. It is less error prone: There are no invalid offsets and no byte arithmetic.* *d. It fixes the commit() functionality with respect to compressed messages. Previously there was effectively no offset for messages inside of a compressed set, so one could only commit ones position at compressed message set boundaries. This made the semantics of compressed messages very problematic.* * * *One of the primary motivators is not the above items, but rather the ability to allow more sophisticated retention policies. Some systems at LinkedIn use Kafka as a kind of commit log. That is they take a stream of changes from Kafka, process them, and apply some munged version of this to a local search index, key-value store, or other data structure for serving. Virtually all of these systems have some notion of a primary key for records. The general problem these systems have to solve is what they need to do if they need to recreate their local state (say if a node fails, or they need to reprocess data in a different way). Since Kafka only will contain a fixed range of data, they can't really re-process from Kafka unless the data they serve is also time-based as we will have cleaned out old messages. But you could imagine a slightly different retention strategy in Kafka that allowed you to retain messages by some primary key. So rather than throwing away old segments you would have the option to "clean" old segments and just retain the latest record for each primary key. That would allow using the kafka log for all restore functionality and still guarantee that you restored the latest value for each key. This retention strategy would only make sense to use for topics that contain data with a primary key, so it would be optional. I think this is actually very powerful when combined with replication because it is a way to get a highly available "commit" or "restore" log.* On Thu, Nov 22, 2012 at 11:36 AM, Philip O'Toole <phi...@loggly.com> wrote: > On Thu, Nov 22, 2012 at 07:33:31AM -0800, Neha Narkhede wrote: > > Yes, in Kafka 0.7, the offset is the byte position of the message in the > > log for the topic partition. In Kafka 0.8, each message is assigned a > > monotonically increasing, contiguous sequence number per partition, > > starting with 1. So each message is addressable using this sequence > number > > instead of the byte position. > > This is interesting. We at Loggly liked the offset, and thought it was an > elegant idea (as explained on the Kafka design page). Are you *replacing* > the offset, or will the sequence number be another way to reference a > message? > > And why the change? Perhaps there is a JIRA ticket explaining it in more > detail. > > Thanks, > > Philip > > > > Also, the offset keeps increasing over the lifetime of a cluster, even if > > Kafka deletes older log segments. > > > > Thanks, > > Neha > > > > On Thursday, November 22, 2012, Paul Garner wrote: > > > > > from what I read, the message offset is the byte position of the > message in > > > the log file that Kafka writes to > > > > > > the logs are rotated and eventually deleted by Kafka > > > > > > ...does this mean the message offset periodically goes back to start at > > > zero again? or the offset keeps increasing for the life of the cluster > as > > > if it was a single big file back to the beginning of time? > > > > > -- > Philip O'Toole > Senior Developer > Loggly, Inc. > San Francisco, CA >