A few questions:
1. What is the relationship between transaction.app.id and the existing
config application.id in streams?
2. The initTransactions() call is a little annoying. Can we get rid of
that and call it automatically if you set a transaction.app.id when we
do the first message send as we do with metadata? Arguably we should have
included a general connect() or init() call in the producer, but given that
we didn't do this it seems weird that the cluster metadata initializes
automatically on demand and the transaction metadata doesn't.
3. The equivalent concept of what we call "fetch.mode" in databases is
called "isolation level" and takes values like "serializable", "read
committed", "read uncommitted". Since we went with transaction as the name
for the thing in between the begin/commit might make sense to use this
terminology for the concept and levels? I think the behavior we are
planning is "read committed" and the alternative re-ordering behavior is
equivalent to "serializable"?
4. Can the PID be made 4 bytes if we handle roll-over gracefully? 2
billion concurrent producers should be enough for anyone, right?
5. One implication of factoring out the message set seems to be you
can't ever "repack" messages to improve compression beyond what is done by
the producer. We'd talked about doing this either by buffering when writing
or during log cleaning. This isn't a show stopper but I think one
implication is that we won't be able to do this. Furthermore with log
cleaning you'd assume that over time ALL messages would collapse down to a
single wrapper as compaction removes the others.
On Wed, Nov 30, 2016 at 2:19 PM, Guozhang Wang <wangg...@gmail.com> wrote:
> Hi all,
> I have just created KIP-98 to enhance Kafka with exactly once delivery
> This KIP adds a transactional messaging mechanism along with an idempotent
> producer implementation to make sure that 1) duplicated messages sent from
> the same identified producer can be detected on the broker side, and 2) a
> group of messages sent within a transaction will atomically be either
> reflected and fetchable to consumers or not as a whole.
> The above wiki page provides a high-level view of the proposed changes as
> well as summarized guarantees. Initial draft of the detailed implementation
> design is described in this Google doc:
> We would love to hear your comments and suggestions.
> -- Guozhang