Hi Philip,

Yes, the offset idea is critical--offsets remain in 0.8. Offset is still
both the primary key and also the logical idea of time in the log, it is
just that now the translation between offset and file position are more
complicated and the offset numbers are always 0, 1, 2, ....

The design document is here:
  https://cwiki.apache.org/confluence/display/KAFKA/Keyed+Messages+Proposal

The JIRA has a number of technical details:
  https://issues.apache.org/jira/browse/KAFKA-506

There was some discussion on this on the mailing list a while back. Here
was the explanation of benefits I gave then:

*Here is what this adds right off the bat with the patch I checked in:*
*a. It is esthetically nice. The first message will have offset 0, the
second message 1, the 100th message offset 99, etc.*
*b. You can read the messages in reverse if you like. If the end of the log
is 9876 then 100 messages before that is 9776.*
*c. It is less error prone: There are no invalid offsets and no byte
arithmetic.*
*d. It fixes the commit() functionality with respect to compressed
messages. Previously there was effectively no offset for messages inside of
a compressed set, so one could only commit ones position at compressed
message set boundaries. This made the semantics of compressed messages very
problematic.*
*
*
*One of the primary motivators is not the above items, but rather the
ability to allow more sophisticated retention policies. Some systems at
LinkedIn use Kafka as a kind of commit log. That is they take a stream of
changes from Kafka, process them, and apply some munged version of this to
a local search index, key-value store, or other data structure for serving.
Virtually all of these systems have some notion of a primary key for
records. The general problem these systems have to solve is what they need
to do if they need to recreate their local state (say if a node fails, or
they need to reprocess data in a different way). Since Kafka only will
contain a fixed range of data, they can't really re-process from Kafka
unless the data they serve is also time-based as we will have cleaned out
old messages. But you could imagine a slightly different retention strategy
in Kafka that allowed you to retain messages by some primary key. So rather
than throwing away old segments you would have the option to "clean" old
segments and just retain the latest record for each primary key. That would
allow using the kafka log for all restore functionality and still guarantee
that you restored the latest value for each key. This retention strategy
would only make sense to use for topics that contain data with a primary
key, so it would be optional. I think this is actually very powerful when
combined with replication because it is a way to get a highly available
"commit" or "restore" log.*



On Thu, Nov 22, 2012 at 11:36 AM, Philip O'Toole <phi...@loggly.com> wrote:

> On Thu, Nov 22, 2012 at 07:33:31AM -0800, Neha Narkhede wrote:
> > Yes, in Kafka 0.7, the offset is the byte position of the message in the
> > log for the topic partition. In Kafka 0.8, each message is assigned a
> > monotonically increasing, contiguous sequence number per partition,
> > starting with 1. So each message is addressable using this sequence
> number
> > instead of the byte position.
>
> This is interesting. We at Loggly liked the offset, and thought it was an
> elegant idea (as explained on the Kafka design page). Are you *replacing*
> the offset, or will the sequence number be another way to reference a
> message?
>
> And why the change? Perhaps there is a JIRA ticket explaining it in more
> detail.
>
> Thanks,
>
> Philip
> >
> > Also, the offset keeps increasing over the lifetime of a cluster, even if
> > Kafka deletes older log segments.
> >
> > Thanks,
> > Neha
> >
> > On Thursday, November 22, 2012, Paul Garner wrote:
> >
> > > from what I read, the message offset is the byte position of the
> message in
> > > the log file that Kafka writes to
> > >
> > > the logs are rotated and eventually deleted by Kafka
> > >
> > > ...does this mean the message offset periodically goes back to start at
> > > zero again? or the offset keeps increasing for the life of the cluster
> as
> > > if it was a single big file back to the beginning of time?
> > >
>
> --
> Philip O'Toole
> Senior Developer
> Loggly, Inc.
> San Francisco, CA
>

Reply via email to