A third possibility is to use a different storage backend, like Cassandra, which easily can support idem potent writes. You would hash the unique message ID and time stamp into row and column keys.
Note that this scheme would possibly allow using as a priority queue. -Evan Carry your candle, run to the darkness Seek out the helpless, deceived and poor Hold out your candle for all to see it Take your candle, and go light your world On Oct 25, 2012, at 7:04 PM, Tom Brown <tombrow...@gmail.com> wrote: > I have come up with two different possibilities, both with different > trade-offs. > > The first would be to support "true" transactions by writing > transactional data into a temporary file and then copy it directly to > the end of the partition when the commit command is created. The > upside to this approach is that individual transactions can be larger > than a single batch, and more than one producer could conduct > transactions at once. The downside is the extra IO involved in writing > it and reading it from disk an extra time. > > The second would be to allow any number of messages to be appended to > a topic, but not move the "end of topic" offset until the commit was > received. If a rollback was received, or the producer timed out, the > partition could be truncated at the most recently recognized "end of > topic" offset. The upside is that there is very little extra IO (only > to store the official "end of topic" metadata), and it seems like it > should be easy to implement. The downside is that this the > "transaction" feature is incompatible with anything but a single > producer per partition. > > I am interested in your thoughts on these. > > --Tom > > On Thu, Oct 25, 2012 at 9:31 PM, Philip O'Toole <phi...@loggly.com> wrote: >> On Thu, Oct 25, 2012 at 06:19:04PM -0700, Neha Narkhede wrote: >>> The closest concept of transaction on the publisher side, that I can >>> think of, is using batch of messages in a single call to the >>> synchronous producer. >>> >>> Precisely, you can configure a Kafka producer to use the "sync" mode >>> and batch messages that require transactional guarantees in a >>> single send() call. That will ensure that either all the messages in >>> the batch are sent or none. >> >> This is an interesting feature -- something I wasn't aware of. Still it >> doesn't solve the problem *completely*. As many people realise, it's still >> possible for the batch of messages to get into Kafka fine, but the ack from >> Kafka to be lost on its way back to the Producer. In that case the Producer >> erroneously believes the messages didn't get in, and might re-send them. >> >> You guys *haven't* solved that issue, right? I believe you write about it on >> the Kafka site. >> >>> >>> Thanks, >>> Neha >>> >>> On Thu, Oct 25, 2012 at 2:44 PM, Tom Brown <tombrow...@gmail.com> wrote: >>>> Is there an accepted, or recommended way to make writes to a Kafka >>>> queue idempotent, or within a transaction? >>>> >>>> I can configure my system such that each queue has exactly one producer. >>>> >>>> (If there are no accepted/recommended ways, I have a few ideas I would >>>> like to propose. I would also be willing to implement them if needed) >>>> >>>> Thanks in advance! >>>> >>>> --Tom >> >> -- >> Philip O'Toole >> >> Senior Developer >> Loggly, Inc. >> San Francisco, Calif. >> www.loggly.com >> >> Come join us! >> http://loggly.com/company/careers/