I don't think all messages need to be sequential. You just need to omit messages from failed transactions in serving fetch requests, and this requires storage proportional to the number of failed transactions. The assumption is that failed transactions are very rare (i.e. due to machine failures) so this should be small.
WRT client versus server the assumption is that all control messages are useful to some consumer so reading all of them on the server side should not be a limitation. There are a number of things not worked out here so I wouldn't take it to seriously I just wanted to throw out the thought experiment because to really be useful I do think it is necessary to allow multiple producers and move any complex logic to the server side. -Jay On Fri, Nov 16, 2012 at 8:46 AM, Tom Brown <tombrow...@gmail.com> wrote: > Jay, > > I'm not sure how you're going to get around the issue of a single > producer per partition. For efficient reads, all of the messages from > a single transaction have to be sequential, and that only happens if > either a) the messages are all written atomically (perhaps from > memory, or temporary storage, etc), or b) all messages come from a > single producer. > > If you use a single (internal) control partition for all topics the > server would need to read and ignore irrelevant transaction records > from topics the consumer isn't interested in. Also, you would not be > able to effectively delete a single partition (though that may only be > valuable for developers). That said, the simplicity of a single > control partition may outweigh those problems. > > --Tom > > On Thu, Nov 15, 2012 at 6:24 PM, Jay Kreps <jay.kr...@gmail.com> wrote: > > Hey Tom, > > > > Yes, this is very similar to what I had in mind. > > > > The primary difference is that I want to implement the control on the > > server-side. That is, rather than having the consumer be smart and use > the > > control topic directly it would be preferable to have the server handle > > this. This way it would be easy to carry this logic across consumers in a > > variety of languages. The implementation would be that we add a new > > parameter to the fetch request read_committed={true, false}. If this > > parameter is set to true then we would not hand out messages until we had > > the commit message for the requested offset. The other advantage of doing > > this on the server side is that I think we could then have only a single > > control/commit topic rather than one per data topic. > > > > I think there might also be an alternative to requiring exclusivity on > the > > producer side--indeed requiring this makes the feature a lot less useful. > > This requires waiting until all offsets in a given range are committed > > before it can be handed out, though this is more complex. The details of > my > > proposal involved a unique producer id per producer and a generation id > > that increased on every "rollback". A commit with a higher generation id > > for an existing producer id would implicitly roll back everything that > > producer sent since the last commit. > > > > -Jay > > > > > > On Wed, Nov 14, 2012 at 12:12 PM, Tom Brown <tombrow...@gmail.com> > wrote: > > > >> Just thought of a way to do transactions in Kafka. I think this > >> solution would cover the most common types of transactions. However, > >> it's often useful to run an idea by a second set of eyes. I am > >> interested in knowing where the holes are in this design that I > >> haven't been able to see. If you're interested in transactional kafka, > >> please review this and let me know any feedback you have. > >> > >> A transactional topic can be approximated by using a second topic as a > >> control stream. Each message in the control topic would contain the > >> offset and length (and an optional transaction ID). There is no change > >> to the messages written to the data topic. The performance impact > >> would generally be low-- the larger the transaction size, the less the > >> performance impact would be. > >> > >> To write a transaction to the data partition, note the end offset of > >> the partition in memory. Write all your messages to the partition. > >> Note the new offset at the end of the partition (to calculate the > >> length). Write the transaction offset+length into the control > >> partition. > >> > >> To read a set of committed data from the data stream: Read the > >> transaction from the control stream. Start reading at the offset > >> stored in the transaction, until you've read the specified length of > >> data. > >> > >> If the producer crashes at any point, the written data will remain in > >> the data partitions, but the transaction will not be written to the > >> control topic, which will prevent those messages from being read by > >> any transactional reader. > >> > >> The assumptions and side-effects of this design are as follows: > >> 1. The control topic mirrors the data topic in terms of brokers and > >> partitions. > >> 2. Each partition can only be fed by a single producer at any given > time. > >> 3. The offset at the end of the partition is available to a consumer. > >> 4. Each transaction involves an extra message, so performance for very > >> small transactions will not be ideal. > >> 5. Rolled-back data remains in each individual partition. > >> 6. A single partition can have more than one consumer (with all > >> consumer coordinated by a single control partition reader). > >> > >> Thanks in advance, > >> Tom Brown > >> >