[
https://issues.apache.org/jira/browse/KAFKA-15798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17789898#comment-17789898
]
A. Sophie Blee-Goldman commented on KAFKA-15798:
------------------------------------------------
Took a quick look at this in the name of running down some of the worst flaky
tests in Streams. I think it's pretty clear that this is failing because of the
state updater thread (see below), but it's not as clear to me whether this
hints at a real bug with the state updater thread or whether it only broke the
named topologies feature.
If it's the latter, we should probably block people from using named topologies
when the state updater thread is enabled in 3.7. Although I'm actually leaning
towards going a step further and just taking out the named topologies
altogether – we can just remove the "public" API classes for now, as extracting
all the internal logic is somewhat of a bigger project that we shouldn't rush.
Of course, this is all assuming there is something about the state updater that
broke named topologies – someone more familiar with the state updater should
definitely verify that this isn't a real bug in normal Streams first! cc
[~cadonna] [~lucasb]
Oh, and this is how I know the state updater thread is responsible: if you look
at [the graph of failure rates for this
test|https://ge.apache.org/scans/tests?search.names=Git%20branch&search.relativeStartTime=P90D&search.rootProjectNames=kafka&search.timeZoneId=America%2FLos_Angeles&search.values=trunk&tests.container=org.apache.kafka.streams.integration.NamedTopologyIntegrationTest&tests.sortField=FLAKY&tests.test=shouldAddAndRemoveNamedTopologiesBeforeStartingAndRouteQueriesToCorrectTopology()],
you'll see it goes from literally zero flakiness to the 2nd most commonly
failing test in all of Streams on Oct 4th. This is the day we turned on the
state updater thread by default.
(It's also a bit concerning that we didn't catch this sooner. The uptick in
failure rate of this test is actually quite sudden. Would be great if we could
somehow manually alert on this sort of thing)
> Flaky Test
> NamedTopologyIntegrationTest.shouldAddAndRemoveNamedTopologiesBeforeStartingAndRouteQueriesToCorrectTopology()
> -------------------------------------------------------------------------------------------------------------------------
>
> Key: KAFKA-15798
> URL: https://issues.apache.org/jira/browse/KAFKA-15798
> Project: Kafka
> Issue Type: Bug
> Components: streams, unit tests
> Reporter: Justine Olshan
> Priority: Major
> Labels: flaky-test
>
> I saw a few examples recently. 2 have the same error, but the third is
> different
> [https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-14629/22/testReport/junit/org.apache.kafka.streams.integration/NamedTopologyIntegrationTest/Build___JDK_8_and_Scala_2_12___shouldAddAndRemoveNamedTopologiesBeforeStartingAndRouteQueriesToCorrectTopology___2/]
> [https://ci-builds.apache.org/job/Kafka/job/kafka/job/trunk/2365/testReport/junit/org.apache.kafka.streams.integration/NamedTopologyIntegrationTest/Build___JDK_21_and_Scala_2_13___shouldAddAndRemoveNamedTopologiesBeforeStartingAndRouteQueriesToCorrectTopology__/]
>
> The failure is like
> {code:java}
> java.lang.AssertionError: Did not receive all 5 records from topic
> output-stream-1 within 60000 ms, currently accumulated data is [] Expected:
> is a value equal to or greater than <5> but: <0> was less than <5>{code}
> The other failure was
> [https://ci-builds.apache.org/job/Kafka/job/kafka/job/trunk/2365/testReport/junit/org.apache.kafka.streams.integration/NamedTopologyIntegrationTest/Build___JDK_8_and_Scala_2_12___shouldAddAndRemoveNamedTopologiesBeforeStartingAndRouteQueriesToCorrectTopology__/]
> {code:java}
> java.lang.AssertionError: Expected: <[0, 1]> but: was <[0]>{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)