[
https://issues.apache.org/jira/browse/KAFKA-19332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18021760#comment-18021760
]
Sanskar Jhajharia commented on KAFKA-19332:
-------------------------------------------
After investigating the flakiness reported in this ticket, I reviewed both the
historical CI data and attempted to reproduce the issue locally. The Develocity
Report [1] clearly shows that while the test has been flaky over the last
ninety days, the number of failures is very rare. On average, this test has
failed {*}only once per month{*}, which is statistically negligible given the
overall CI volume. In practice, this means the flakiness has no real impact on
the contributor workflow or CI reliability.
Reproducibility was also tested thoroughly in {*}local environments{*}. Even
with extensive reruns, ranging from {*}500 to 1000 iterations, the test
consistently passed without issue{*}. This strongly suggests that the flakiness
is tied to environmental factors within CI, such as timing or scheduling
artifacts, rather than being caused by a deterministic issue within the test
logic or the underlying Kafka code. In other words, the conditions that trigger
the failure seem unique to CI and cannot be reproduced in a controlled local
setting.
It is also worth noting that the failure does not occur during the core logic
of the test [2] [3]. The intent of the test is to validate behaviour when
switching read isolation levels, and in particular how the consumer handles
transactional markers. However, in every recorded failure the test does not
reach this portion of the logic. Instead, the flake occurs earlier, when the
consumer attempts to process the abort transaction marker offset for Message 4.
Since this happens prior to the test exercising its main verification path, the
failure points to an environmental timing artifact rather than a flaw in the
functional behaviour being tested.
Given the rarity of the issue, the inability to reproduce locally, and the fact
that the flake occurs before the main test logic is even invoked, there is no
evidence of a defect in Kafka itself or in the intended functionality that the
test covers. The impact is minimal, and the test remains valuable as a
safeguard for transactional isolation-level behaviour. On this basis, there is
no actionable change to be made at this time. In conclusion, this ticket is
being closed. We will continue to monitor CI to ensure that the failure rate
does not increase. Should the frequency of the flakiness grow, or should it
become reproducible in a local environment, the issue can be revisited with a
more targeted investigation and fix. At present, however, the evidence
indicates that the flakiness is rare, environment-dependent, and not disruptive
enough to justify further intervention.
[1]:
[https://develocity.apache.org/scans/tests?search.names=CI%20workflow,Git%20repository&search.relativeStartTime=P90D&search.rootProjectNames=kafka&search.tags=github,trunk&search.tasks=test&search.timeZoneId=Asia%2FCalcutta&search.values=CI,https:%2F%2Fgithub.com%2Fapache%2Fkafka&tests.container=org.apache.kafka.clients.consumer.ShareConsumerTest&tests.sortField=FLAKY]
[2]:
[https://github.com/apache/kafka/blob/trunk/clients/clients-integration-tests/src/test/java/org/apache/kafka/clients/consumer/ShareConsumerTest.java#L2681-L2684]
[3]:
[https://github.com/apache/kafka/blob/trunk/clients/clients-integration-tests/src/test/java/org/apache/kafka/clients/consumer/ShareConsumerTest.java#L2579-L2587]
> Fix flaky test :
> testAlterReadCommittedToReadUncommittedIsolationLevelWithReleaseAck and
> testAlterReadCommittedToReadUncommittedIsolationLevelWithRejectAck
> -----------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: KAFKA-19332
> URL: https://issues.apache.org/jira/browse/KAFKA-19332
> Project: Kafka
> Issue Type: Sub-task
> Reporter: Shivsundar R
> Assignee: Sanskar Jhajharia
> Priority: Major
>
> The test has been flaky in AK builds -
> [https://develocity.apache.org/scans/tests?search.names=CI%20workflow%2CGit%20repository&search.relativeStartTime=P28D&search.rootProjectNames=kafka&search.tags=github%2Ctrunk&search.tasks=test&search.timeZoneId=Asia%2FCalcutta&search.values=CI%2Chttps:%2F%2Fgithub.com%2Fapache%2Fkafka&tests.container=org.apache.kafka.clients.consumer.ShareConsumerTest&tests.test=testAlterReadCommittedToReadUncommittedIsolationLevelWithReleaseAck()%5B2%5D]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)