[
https://issues.apache.org/jira/browse/KAFKA-13128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385841#comment-17385841
]
A. Sophie Blee-Goldman commented on KAFKA-13128:
------------------------------------------------
The failure is from the second line in this series of assertions
{code:java}
assertThat(store1.get(key), is(notNullValue()));
assertThat(store1.get(key2), is(notNullValue()));
assertThat(store1.get(key3), is(notNullValue()));
{code}
which is basically the first time we attempt IQ after starting up. The test
setup includes starting Streams and waiting for it to reach RUNNING, then
adding a new thread, and finally producing a set of 100 records for each of the
three keys. After that it waits for all records to be processed *for _key3_*
and then proceeds to the above assertions.
I suspect the problem is that we only wait for all data to be processed for
_key3_, but not the other two keys. In theory this should work, since the data
for _key3_ is produced last and would have the largest timestamps meaning the
keys should be processed more or less in order. However the input topic
actually has two partitions, so it could be that _key1_ and _key3_ correspond
to task 1 while _key2_ corresponds to task 2. Again, that shouldn't affect the
order in which records are processed – as long as the tasks are on the same
thread.
But we started up a new thread in between waiting for Streams to reach RUNNING
and producing data to the input topics. This new thread has to be assigned one
of the tasks, but due to cooperative rebalancing it will take two full (though
short) rebalances before the new thread can actually start processing any
tasks. Therefore as long as the original thread continues to own the task
corresponding to _key3_ after the new thread is added, it can easily get
through all records for _key3_. Which would mean the test can proceed to the
above assertions while the new thread is still waiting to start processing any
data for _key2_ at all.
There are a few ways we can address this given how many things had to happen
exactly right in order to see this failure, but the simplest fix is to just
wait on all three keys to be fully processed rather than just the one. This
seems to align with the original intention of the test best as well
> Flaky Test
> StoreQueryIntegrationTest.shouldQueryStoresAfterAddingAndRemovingStreamThread
> ----------------------------------------------------------------------------------------
>
> Key: KAFKA-13128
> URL: https://issues.apache.org/jira/browse/KAFKA-13128
> Project: Kafka
> Issue Type: Bug
> Components: streams
> Affects Versions: 3.1.0
> Reporter: A. Sophie Blee-Goldman
> Priority: Major
> Labels: flaky-test
>
> h3. Stacktrace
> java.lang.AssertionError: Expected: is not null but: was null
> at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
> at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:6)
> at
> org.apache.kafka.streams.integration.StoreQueryIntegrationTest.lambda$shouldQueryStoresAfterAddingAndRemovingStreamThread$19(StoreQueryIntegrationTest.java:461)
> at
> org.apache.kafka.streams.integration.StoreQueryIntegrationTest.until(StoreQueryIntegrationTest.java:506)
> at
> org.apache.kafka.streams.integration.StoreQueryIntegrationTest.shouldQueryStoresAfterAddingAndRemovingStreamThread(StoreQueryIntegrationTest.java:455)
>
> https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-11085/5/testReport/org.apache.kafka.streams.integration/StoreQueryIntegrationTest/Build___JDK_16_and_Scala_2_13___shouldQueryStoresAfterAddingAndRemovingStreamThread_2/
--
This message was sent by Atlassian Jira
(v8.3.4#803005)