Justine Olshan created KAFKA-18654:
--------------------------------------
Summary: Transaction Version 2 performance regression due to early
return
Key: KAFKA-18654
URL: https://issues.apache.org/jira/browse/KAFKA-18654
Project: Kafka
Issue Type: Bug
Affects Versions: 4.0.0
Reporter: Justine Olshan
Assignee: Justine Olshan
https://issues.apache.org/jira/browse/KAFKA-18575 solved a critical race
condition by returning with CONCURRENT_TRANSACTIONS early when the transaction
was still completing.
In testing, it was discovered that this early return could cause performance
regressions.
Prior to KIP-890 the addpartitions call was a separate call from the producer.
There was a previous change https://issues.apache.org/jira/browse/KAFKA-5477
that decreased the retry backoff. With KIP-890 and making the call through the
produce path, we go back to the default retry backoff which takes longer. Prior
to 18575 we introduce a slight delay when sending to the coordinator, so prior
to 18575, we are less likely to return quickly and get stuck in this backoff.
There are two ways to address this regression:
1. Solve 18575 via the other proposed solution for that ticket, don't return
early and check the epoch to avoid the verification guard race
2. With the bumped produce version, return concurrent transactions and change
produce handling to have a shorter backoff for this error.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)