[
https://issues.apache.org/jira/browse/FLINK-34589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matthias Pohl updated FLINK-34589:
----------------------------------
Issue Type: Technical Debt (was: Bug)
Priority: Minor (was: Major)
> FineGrainedSlotManager doesn't handle errors in the resource reconcilliation
> step
> ---------------------------------------------------------------------------------
>
> Key: FLINK-34589
> URL: https://issues.apache.org/jira/browse/FLINK-34589
> Project: Flink
> Issue Type: Technical Debt
> Components: Runtime / Coordination
> Affects Versions: 1.19.0, 1.18.1, 1.20.0
> Reporter: Matthias Pohl
> Priority: Minor
>
> I noticed during my work on FLINK-34427 that the reconcilliation is scheduled
> periodically when starting the {{SlotManager}}. But it doesn't handle errors
> in this step. I see two options here:
> 1. Fail fatally because such an error might indicate a major issue with the
> RM backend.
> 2. Log the failure and continue the scheduled task even in case of an error.
> My understanding is that we're just not able to recreate TaskManagers which
> should be a transient issue and could be resolved in the backend (YARN, k8s).
> That's why I would lean towards option 2.
> [~xtsong] WDYT?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)