[
https://issues.apache.org/jira/browse/FLINK-22488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Flink Jira Bot updated FLINK-22488:
-----------------------------------
Labels: auto-deprioritized-minor pull-request-available (was:
pull-request-available stale-minor)
Priority: Not a Priority (was: Minor)
This issue was labeled "stale-minor" 7 days ago and has not received any
updates so it is being deprioritized. If this ticket is actually Minor, please
raise the priority and ask a committer to assign you the issue or revive the
public discussion.
> KafkaSourceLegacyITCase.testOneToOneSources failed due to "OperatorEvent from
> an OperatorCoordinator to a task was lost"
> ------------------------------------------------------------------------------------------------------------------------
>
> Key: FLINK-22488
> URL: https://issues.apache.org/jira/browse/FLINK-22488
> Project: Flink
> Issue Type: Improvement
> Reporter: Dong Lin
> Priority: Not a Priority
> Labels: auto-deprioritized-minor, pull-request-available
>
> According to [1], the test KafkaSourceLegacyITCase.testOneToOneSources failed
> because it runs a streaming job (which uses KafkaSource) with
> restartAttempts=1. In addition to the failover explicitly triggered by the
> FailingIdentityMapper, the job additionally failed due to
> "org.apache.flink.util.FlinkException: An OperatorEvent from an
> OperatorCoordinator to a task was lost. Triggering task failover to ensure
> consistency", which is unexpected by the test.
> Note that SubtaskGatewayImpl was updated by [2] on 4/14 which triggers task
> failover if any OperatorEvent was lost. This could explain why those Kafka
> tests start to fail due to the exception described above.
> In order to make this test stable, let's try to understand why there is such
> a high chance of loosing OperatorEvent in the Azure test pipeline. And if we
> could not avoid loosing OperatorEvent in the test pipeline, we probably need
> to update the test to allow the pipeline being restarted arbitrary times (and
> still be able to stop the test on the happy path).
> [1]
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=17212&view=logs&j=c5f0071e-1851-543e-9a45-9ac140befc32&t=1fb1a56f-e8b5-5a82-00a0-a2db7757b4f5&l=6960
> [2] https://github.com/apache/flink/pull/15605
--
This message was sent by Atlassian Jira
(v8.20.1#820001)