Re: [DISCUSS] KIP-890 Server Side Defense

Matthias J. Sax Mon, 21 Nov 2022 19:10:54 -0800

Thanks for the KIP.

Couple of clarification questions (I am not a broker expert do maybesome question are obvious for others, but not for me with my lack ofbroker knowledge).




(10)

The delayed message case can also violate EOS if the delayed message comes in 
after the next addPartitionsToTxn request comes in. Effectively we may see a 
message from a previous (aborted) transaction become part of the next 
transaction.

What happens if the message come in before the next addPartitionsToTxnrequest? It seems the broker hosting the data partitions won't knowanything about it and append it to the partition, too? What is thedifference between both cases?

Also, it seems a TX would only hang, if there is no following TX that iseither committer or aborted? Thus, for the case above, the TX mightactually not hang (of course, we might get an EOS violation if the firstTX was aborted and the second committed, or the other way around).



(20)

Of course, 1 and 2 require client-side changes, so for older clients, those 
approaches won’t apply.

For (1) I understand why a client change is necessary, but not sure whywe need a client change for (2). Can you elaborate? -- Later you explainthat we should send a DescribeTransactionRequest, but I am not sure why?Can't we not just do an implicit AddPartiitonToTx, too? If the oldproducer correctly registered the partition already, the TX-coordinatorcan just ignore it as it's an idempotent operation?



(30)

To cover older clients, we will ensure a transaction is ongoing before we write 
to a transaction


Not sure what you mean by this? Can you elaborate?


(40)

[the TX-coordinator] will write the prepare commit message with a bumped epoch 
and send WriteTxnMarkerRequests with the bumped epoch.

Why do we use the bumped epoch for both? It seems more intuitive to usethe current epoch, and only return the bumped epoch to the producer?



(50) "Implicit AddPartitionToTransaction"

Why does the implicitly sent request need to be synchronous? The KIPalso says

in case we need to abort and need to know which partitions


What do you mean by this?

we don’t want to write to it before we store in the transaction manager


Do you mean TX-coordinator instead of "manager"?


(60)

For older clients and ensuring that the TX is ongoing, you describe arace condition. I am not sure if I can follow here. Can you elaborate?




-Matthias



On 11/18/22 1:21 PM, Justine Olshan wrote:

Hey all!

I'd like to start a discussion on my proposal to add some server-side
checks on transactions to avoid hanging transactions. I know this has been
an issue for some time, so I really hope this KIP will be helpful for many
users of EOS.

The KIP includes changes that will be compatible with old clients and
changes to improve performance and correctness on new clients.

Please take a look and leave any comments you may have!

KIP:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-890%3A+Transactions+Server-Side+Defense
JIRA: https://issues.apache.org/jira/browse/KAFKA-14402

Thanks!
Justine

Re: [DISCUSS] KIP-890 Server Side Defense

Reply via email to