when the leader decides to commit a TX (of X msgs, known at this point), it
writes an "intent to append X msgs" msg (control?) followed by the X msgs
(at this point it is the leader and therefor point of sync, so this can be
done with no "foreign" msgs in between).
if there's a crash/change of leadership the new leader can just roll back
(remove what partial contents it had) if it sees the "intent" msg but dosnt
see X msgs belonging to the TX after it. the watermark does not advance
into the middle of a TX - so nothing is visible to any consumer until the
whole thing is committed and replicated (or crashes and rolled back). which
means i dont think TX storage needs replication, and atomicity to consumers
is retained.

I cant argue with the latency argument, but:

1. if TXs can be done in-mem maybe TX per-msg isnt that expensive?
2. I think a logical clock approach (with broker-side dedup based on the
clock) could provide the same exactly once semantics without requiring
transactions at all?

however, I concede that as you describe it (long running TXs where commits
are actually "checkpoint"s spaced to optimize overhead vs RPO/RTO) you
would require read uncommitted to minimize latency.

On Tue, Dec 20, 2016 at 1:24 PM, Apurva Mehta <apu...@confluent.io> wrote:

> durably at the moment we enter the pre-commit phase. If we
> don't have durable persistence of these messages, we can't have idempotent
> and atomic copying into the main  log, and your proposal to date does not
> show otherwise.
>

Reply via email to