[ https://issues.apache.org/jira/browse/KAFKA-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17869723#comment-17869723 ]
Dongnuo Lyu edited comment on KAFKA-17219 at 7/30/24 7:53 PM: -------------------------------------------------------------- ??it's really odd to me that we're still seeing these?? -Yeah at least in `consumer_test` the fix is missing. We can add them back when AK is unblocked.- Oh we do have {{wait_until}} fixes, it's just missing in {{test_consumer_bounce}} and {{test_broker_rolling_bounce}}. It should also fixes the partition_owner assertion. was (Author: JIRAUSER302289): > it's really odd to me that we're still seeing these Yeah at least in `consumer_test` the fix is missing. We can add them back when AK is unblocked. > Adjust system test framework for new protocol consumer > ------------------------------------------------------ > > Key: KAFKA-17219 > URL: https://issues.apache.org/jira/browse/KAFKA-17219 > Project: Kafka > Issue Type: Task > Components: clients, consumer, system tests > Reporter: Dongnuo Lyu > Priority: Major > Labels: kip-848-client-support > > The current test framework doesn't work well with the existing tests using > the new consumer protocol. There are two main issues I've seen. > > First, we sometimes assume there is no rebalance triggered, for instance in > {{consumer_test.py::test_consumer_failure}} > {code:java} > verify that there were no rebalances on failover > assert num_rebalances == consumer.num_rebalances(), "Broker failure should > not cause a rebalance"{code} > The current frame work calculates {{num_rebalances}} by increment by one > every time a new assignment is received, so if a reconciliation happened > during the failover, {{num_rebalances}} will also be incremented. For new > protocol we need a new way to update {{{}num_rebalances{}}}. > > Second, for the new protocol, we need a way to make sure all members have > joined {*}and stablized{*}. Currently we only make sure all members have > joined (the event handlers are all in Joined state), where some partitions > haven't been assigned and more time is needed for reconciliation. The issue > can cause failure in assertions like timeout waiting for consumption and > {code:java} > partition_owner = consumer.owner(partition) > assert partition_owner is not None {code} > > For a short term solution, we can make the tests pass by bypassing with > adding {{time.sleep}} or skip checking {{{}num_rebalance{}}}. To truly fix > them, we should adjust > {{tools/src/main/java/org/apache/kafka/tools/VerifiableConsumer.java}} to > work well with the new protocol. -- This message was sent by Atlassian Jira (v8.20.10#820010)