Luke Chen created KAFKA-20716:
---------------------------------
Summary: LSO stuck after unclean leader election
Key: KAFKA-20716
URL: https://issues.apache.org/jira/browse/KAFKA-20716
Project: Kafka
Issue Type: Bug
Reporter: Luke Chen
When a topic has an unclean leader election, the new leader might contain txn
data without COMMIT/ABORT markers. However, the data in __transaction_state
shows the transaction is committed/aborted, so the transaction timeout will not
expire here. This causes the LSO stuck and READ_COMMITTED will never proceed.
reproduce steps:
1. Create a cluster with 2 brokers
2. Create a topic with unclean leader election enabled
{code:java}
bin/kafka-topics.sh --create --topic t1 --bootstrap-server localhost:9091
--replication-factor 2 --config unclean.leader.election.enable=true {code}
3. write a txn record to the topic t1, but wait for 10 seconds before
committing it.
4. Before the record committed in step (3), shutdown the follower broker
(suppose it's broker 2)
5. Now, the the topic t1-0 in broker 1 contains [offset 0 (data) and offset 1
(commit)], but broker 2 only contains [offset 0 (data)]
6. shutdown broker 1, so both broker 1 and 2 are down, but broker 2 is not the
last leader or ELR
7. start up broker 2, unclean leader election triggered
8. start up broker 1, log truncation on t1-0, so the log becomes [offset 0
(data)]
9. appending more non-txn data to t1-0
10. consume with READ_COMMITTED, it'll return nothing.
We never document anywhere about unclean leader election support in transaction
feature, I think this should be supported and we have to find out a solution
for it.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)