[ 
https://issues.apache.org/jira/browse/KAFKA-17219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17869723#comment-17869723
 ] 

Dongnuo Lyu edited comment on KAFKA-17219 at 7/30/24 7:53 PM:
--------------------------------------------------------------

??it's really odd to me that we're still seeing these??
-Yeah at least in `consumer_test` the fix is missing. We can add them back when 
AK is unblocked.-
Oh we do have {{wait_until}} fixes, it's just missing in 
{{test_consumer_bounce}} and {{test_broker_rolling_bounce}}. It should also 
fixes the partition_owner assertion.


was (Author: JIRAUSER302289):
> it's really odd to me that we're still seeing these
Yeah at least in `consumer_test` the fix is missing. We can add them back when 
AK is unblocked.

> Adjust system test framework for new protocol consumer
> ------------------------------------------------------
>
>                 Key: KAFKA-17219
>                 URL: https://issues.apache.org/jira/browse/KAFKA-17219
>             Project: Kafka
>          Issue Type: Task
>          Components: clients, consumer, system tests
>            Reporter: Dongnuo Lyu
>            Priority: Major
>              Labels: kip-848-client-support
>
> The current test framework doesn't work well with the existing tests using 
> the new consumer protocol. There are two main issues I've seen.
>  
> First, we sometimes assume there is no rebalance triggered, for instance in 
> {{consumer_test.py::test_consumer_failure}}
> {code:java}
> verify that there were no rebalances on failover
> assert num_rebalances == consumer.num_rebalances(), "Broker failure should 
> not cause a rebalance"{code}
> The current frame work calculates {{num_rebalances}} by increment by one 
> every time a new assignment is received, so if a reconciliation happened 
> during the failover, {{num_rebalances}} will also be incremented. For new 
> protocol we need a new way to update {{{}num_rebalances{}}}.
>  
> Second, for the new protocol, we need a way to make sure all members have 
> joined {*}and stablized{*}. Currently we only make sure all members have 
> joined (the event handlers are all in Joined state), where some partitions 
> haven't been assigned and more time is needed for reconciliation. The issue 
> can cause failure in assertions like timeout waiting for consumption and
> {code:java}
> partition_owner = consumer.owner(partition)
> assert partition_owner is not None {code}
>  
> For a short term solution, we can make the tests pass by bypassing with 
> adding {{time.sleep}} or skip checking {{{}num_rebalance{}}}. To truly fix 
> them, we should adjust 
> {{tools/src/main/java/org/apache/kafka/tools/VerifiableConsumer.java}} to 
> work well with the new protocol.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to