[
https://issues.apache.org/jira/browse/FLINK-22893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chesnay Schepler updated FLINK-22893:
-------------------------------------
Description:
The NodeCache used by the LeaderElection-/-RetrievalDrivers ensures that
parents to the observed node exists by regularly issuing mkdir calls. This
operation can fail if concurrently the HA data is being cleaned up, which
results in curator throwing an unhandled exception which crashes the TM.
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=18700&view=logs&j=2c3cbe13-dee0-5837-cf47-3053da9a8a78&t=2c7d57b9-7341-5a87-c9af-2cf7cc1a37dc&l=4382
was:https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=18700&view=logs&j=2c3cbe13-dee0-5837-cf47-3053da9a8a78&t=2c7d57b9-7341-5a87-c9af-2cf7cc1a37dc&l=4382
> Leader retrieval fails with NoNodeException
> -------------------------------------------
>
> Key: FLINK-22893
> URL: https://issues.apache.org/jira/browse/FLINK-22893
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Coordination
> Affects Versions: 1.11.1, 1.14.0
> Reporter: Dawid Wysakowicz
> Assignee: Chesnay Schepler
> Priority: Critical
> Labels: pull-request-available, test-stability
> Fix For: 1.14.0
>
>
> The NodeCache used by the LeaderElection-/-RetrievalDrivers ensures that
> parents to the observed node exists by regularly issuing mkdir calls. This
> operation can fail if concurrently the HA data is being cleaned up, which
> results in curator throwing an unhandled exception which crashes the TM.
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=18700&view=logs&j=2c3cbe13-dee0-5837-cf47-3053da9a8a78&t=2c7d57b9-7341-5a87-c9af-2cf7cc1a37dc&l=4382
--
This message was sent by Atlassian Jira
(v8.3.4#803005)