Hey Tom, Yes, this is very similar to what I had in mind.
The primary difference is that I want to implement the control on the server-side. That is, rather than having the consumer be smart and use the control topic directly it would be preferable to have the server handle this. This way it would be easy to carry this logic across consumers in a variety of languages. The implementation would be that we add a new parameter to the fetch request read_committed={true, false}. If this parameter is set to true then we would not hand out messages until we had the commit message for the requested offset. The other advantage of doing this on the server side is that I think we could then have only a single control/commit topic rather than one per data topic. I think there might also be an alternative to requiring exclusivity on the producer side--indeed requiring this makes the feature a lot less useful. This requires waiting until all offsets in a given range are committed before it can be handed out, though this is more complex. The details of my proposal involved a unique producer id per producer and a generation id that increased on every "rollback". A commit with a higher generation id for an existing producer id would implicitly roll back everything that producer sent since the last commit. -Jay On Wed, Nov 14, 2012 at 12:12 PM, Tom Brown <tombrow...@gmail.com> wrote: > Just thought of a way to do transactions in Kafka. I think this > solution would cover the most common types of transactions. However, > it's often useful to run an idea by a second set of eyes. I am > interested in knowing where the holes are in this design that I > haven't been able to see. If you're interested in transactional kafka, > please review this and let me know any feedback you have. > > A transactional topic can be approximated by using a second topic as a > control stream. Each message in the control topic would contain the > offset and length (and an optional transaction ID). There is no change > to the messages written to the data topic. The performance impact > would generally be low-- the larger the transaction size, the less the > performance impact would be. > > To write a transaction to the data partition, note the end offset of > the partition in memory. Write all your messages to the partition. > Note the new offset at the end of the partition (to calculate the > length). Write the transaction offset+length into the control > partition. > > To read a set of committed data from the data stream: Read the > transaction from the control stream. Start reading at the offset > stored in the transaction, until you've read the specified length of > data. > > If the producer crashes at any point, the written data will remain in > the data partitions, but the transaction will not be written to the > control topic, which will prevent those messages from being read by > any transactional reader. > > The assumptions and side-effects of this design are as follows: > 1. The control topic mirrors the data topic in terms of brokers and > partitions. > 2. Each partition can only be fed by a single producer at any given time. > 3. The offset at the end of the partition is available to a consumer. > 4. Each transaction involves an extra message, so performance for very > small transactions will not be ideal. > 5. Rolled-back data remains in each individual partition. > 6. A single partition can have more than one consumer (with all > consumer coordinated by a single control partition reader). > > Thanks in advance, > Tom Brown >