Hi Richard,

This is something that has been discussed a few times, but the
outcomes of the discussions never written down. Thanks for starting
the conversation.

Responses to your PIP inline.

> Motivation: Pulsar currently could improve upon their system of sending
> packets of data by implementing transactional messaging. This system
> enforces eventual consistency within the system, and allows operations to
> be performed atomically.

Transactional acknowledgement also needs to be taken into account, so
that clients can say something like when I commit m2 and m3 to topics
X & Y, m1 should be marked as acknowledged on topic Z. Acknowledgement
makes the transactional bit more complicated. With transactions on
publish we don't have to deal with conflict resolution, but with
acknowledgement, we can't acknowledge a message which has already been
acknowledged.

> As described in the issue, we would implement the following policy in
> Producer and Pulsar Broker:
> 1. The producer produces the pre-processing transaction message. At this
> point, the broker will set the status of this message to unknown.

By "set the message to unknown", do you mean the broker will cache the
message, not writing it to any log? This could become a problem on the
commit as they commit may go to another broker, and the broker storing
the message could crash and then you'd end up with a committed message
where the content is missing.

> 2. After the local transaction is successfully executed, the commit message
> is sent, otherwise the rollback message is sent.

What does the commit message contain? And where is it sent? Keep in
mind that for transactional publish to be useful, we need be able to
write to many topics, and topics may live on different brokers. But
the commit message has to be atomic by definition, so it can only go
to one broker. In designs we've discussed previously, this was handled
by a component called the transaction coordinator, which is a logical
component which each broker knows how to talk to. For a transaction
the commit message is sent to the coordinator, which writes it to its
own log, and then goes through each topic in the commit and marks the
transaction as completed.

> - A configuration called ```maxMessageUnknownTime```. Consider this
> scenario: the pre-processing transaction message is sent, but the commit or
> rollback message is never received, which could mean that the status of a
> message would be permanently unknown. To avoid this from happening, we
> would need a config which limits the amount of time the status of a message
> could be unknown (i.e. ```maxMessageUnknownTime```) After that, the message
> would be discarded.

In transactional systems this is normally known as the low watermark,
but it can't be based on wallclock time. Brokers may have wildly
different times, and if one broker could abort a transaction while
that transactions is still live on another broker. In previous
discussions, each transaction was given a transaction ID, which is a
monotonically increasing number (allocated by the transaction
coordinator). As the commit is going through the coordinator as well,
the coordinator can decide whether the transaction ID is below the low
water mark which it also controls.

Cheers,
Ivan

Reply via email to