For this particular use case, you can potentially include the message
offset from broker A in the message sent to broker B. If the transaction
fails, you read the last message from broker B and use the included offset
to resume the consumer from broker A. This assumes that there is only a
single client writing to broker B (for a particular partition).

Thanks,

Jun

On Fri, Oct 26, 2012 at 11:31 AM, Guozhang Wang <g...@cs.cornell.edu> wrote:

> I am also quite interested in this thread, and I have another question
> here to ask about committing consumed messages. For example, if I need a
> program which acts both as a consumer and a producer, and the actions are
> wrapped in a "transaction":
>
> Transaction start:
>
>    Get next message from broker A;
>
>    Do something;
>
>    Send a message to broker B;
>
> Commit.
>
>
> If the transaction aborts after reading the message from broker A, is it
> possible to logically "put the message back" to brokers? I remember that
> Amazon Queue Service use some sort of lease mechanism, which might work for
> this case. But I am afraid that will affect the throughput a lot..
>
> --Guozhang
>
>
> -----Original Message-----
> From: Jay Kreps [mailto:jay.kr...@gmail.com]
> Sent: Friday, October 26, 2012 2:08 PM
> To: kafka-users@incubator.apache.org
> Subject: Re: Transactional writing
>
> This is an important feature and I am interested in helping out in the
> design and implementation, though I am working on 0.8 features for the next
> month so I may not be of too much use. I have thought a little bit about
> this, but I am not yet sure of the best approach.
>
> Here is a specific use case I think is important to address: consider a
> case where you are doing processing of one or more streams and producing an
> output stream. This processing may involve some kind of local state (say
> counters or other local aggregation intermediate state). This is a common
> scenario. The problem is to give reasonable semantics to this computation
> in the presence of failures. The processor effectively has a
> position/offset in each of its input streams as well as whatever local
> state. The problem is that if this process fails it needs to restore to a
> state that matches the last produced messages. There are several solutions
> to this problem. One is to make the output somehow idempotent, this will
> solve some cases but is not a general solution as many things cannot be
> made idempotent easily.
>
> I think the two proposals you give outline a couple of basic approaches:
> 1. Store the messages on the server somewhere but don't add them to the log
> until the commit call
> 2. Store the messages in the log but don't make them available to the
> consumer until the commit call
> Another option you didn't mention:
>
> I can give several subtleties to these approaches.
>
> One advantage of the second approach is that messages are in the log and
> can be available for reading or not. This makes it possible to support a
> kind of "dirty read" that allows the consumer to specify whether they want
> to immediately see all messages with low latency but potentially see
> uncommitted messages or only see committed messages.
>
> The problem with the second approach at least in the way you describe it is
> that you have to lock the log until the commit occurs otherwise you can't
> roll back (because otherwise someone else may have appended their own
> messages and you can't truncate the log). This would have all the problems
> of remote locks. I think this might be a deal-breaker.
>
> Another variation on the second approach would be the following: have each
> producer maintain an id and generation number. Keep a schedule of valid
> offset/id/generation numbers on the broker and only hand these out. This
> solution would support non-blocking multi-writer appends but requires more
> participation from the producer (i.e. getting a generation number and id).
>
> Cheers,
>
> -Jay
>
> On Thu, Oct 25, 2012 at 7:04 PM, Tom Brown <tombrow...@gmail.com> wrote:
>
> > I have come up with two different possibilities, both with different
> > trade-offs.
> >
> > The first would be to support "true" transactions by writing
> > transactional data into a temporary file and then copy it directly to
> > the end of the partition when the commit command is created. The
> > upside to this approach is that individual transactions can be larger
> > than a single batch, and more than one producer could conduct
> > transactions at once. The downside is the extra IO involved in writing
> > it and reading it from disk an extra time.
> >
> > The second would be to allow any number of messages to be appended to
> > a topic, but not move the "end of topic" offset until the commit was
> > received. If a rollback was received, or the producer timed out, the
> > partition could be truncated at the most recently recognized "end of
> > topic" offset. The upside is that there is very little extra IO (only
> > to store the official "end of topic" metadata), and it seems like it
> > should be easy to implement. The downside is that this the
> > "transaction" feature is incompatible with anything but a single
> > producer per partition.
> >
> > I am interested in your thoughts on these.
> >
> > --Tom
> >
> > On Thu, Oct 25, 2012 at 9:31 PM, Philip O'Toole <phi...@loggly.com>
> wrote:
> > > On Thu, Oct 25, 2012 at 06:19:04PM -0700, Neha Narkhede wrote:
> > >> The closest concept of transaction on the publisher side, that I can
> > >> think of, is using batch of messages in a single call to the
> > >> synchronous producer.
> > >>
> > >> Precisely, you can configure a Kafka producer to use the "sync" mode
> > >> and batch messages that require transactional guarantees in a
> > >> single send() call. That will ensure that either all the messages in
> > >> the batch are sent or none.
> > >
> > > This is an interesting feature -- something I wasn't aware of. Still it
> > > doesn't solve the problem *completely*. As many people realise, it's
> > still
> > > possible for the batch of messages to get into Kafka fine, but the ack
> > from
> > > Kafka to be lost on its way back to the Producer. In that case the
> > Producer
> > > erroneously believes the messages didn't get in, and might re-send
> them.
> > >
> > > You guys *haven't* solved that issue, right? I believe you write about
> > it on
> > > the Kafka site.
> > >
> > >>
> > >> Thanks,
> > >> Neha
> > >>
> > >> On Thu, Oct 25, 2012 at 2:44 PM, Tom Brown <tombrow...@gmail.com>
> > wrote:
> > >> > Is there an accepted, or recommended way to make writes to a Kafka
> > >> > queue idempotent, or within a transaction?
> > >> >
> > >> > I can configure my system such that each queue has exactly one
> > producer.
> > >> >
> > >> > (If there are no accepted/recommended ways, I have a few ideas I
> would
> > >> > like to propose. I would also be willing to implement them if
> needed)
> > >> >
> > >> > Thanks in advance!
> > >> >
> > >> > --Tom
> > >
> > > --
> > > Philip O'Toole
> > >
> > > Senior Developer
> > > Loggly, Inc.
> > > San Francisco, Calif.
> > > www.loggly.com
> > >
> > > Come join us!
> > > http://loggly.com/company/careers/
> >
>

Reply via email to