Matthias Pohl created FLINK-34589:
-------------------------------------

             Summary: FineGrainedSlotManager doesn't handle errors in the 
resource reconcilliation step
                 Key: FLINK-34589
                 URL: https://issues.apache.org/jira/browse/FLINK-34589
             Project: Flink
          Issue Type: Bug
          Components: Runtime / Coordination
    Affects Versions: 1.18.1, 1.19.0, 1.20.0
            Reporter: Matthias Pohl


I noticed during my work on FLINK-34427 that the reconcilliation is scheduled 
periodically when starting the {{SlotManager}}. But it doesn't handle errors in 
this step. I see two options here:
1. Fail fatally because such an error might indicate a major issue with the RM 
backend.
2. Log the failure and continue the scheduled task even in case of an error.

My understanding is that we're just not able to recreate TaskManagers which 
should be a transient issue and could be resolved in the backend (YARN, k8s). 
That's why I would lean towards option 2.

[~xtsong] WDYT?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to