Matthias Pohl created FLINK-31709:
-------------------------------------

             Summary: JobResultStore and ExecutionGraphInfoStore could be merged
                 Key: FLINK-31709
                 URL: https://issues.apache.org/jira/browse/FLINK-31709
             Project: Flink
          Issue Type: New Feature
          Components: Runtime / Coordination
            Reporter: Matthias Pohl


This is a initial proposal for an improvement in coordination layer:

The {{JobResultStore}} (JRS) was introduced as part of 
[FLIP-194|https://cwiki.apache.org/confluence/display/FLINK/FLIP-194%3A+introduce+the+jobresultstore].
 For now, it only stores the JobResult. Through the JRS, jobs can be marked as 
finished even when the JobManager fails and the information from the 
{{ExecutionGraphInfoStore}} is lost (see FLINK-11813).

While implementing {{FLIP-194}}, it became apparent, that we have some 
redundancy between the JRS and the {{ExecutionGraphInfoStore}}. Both components 
store some meta information of a finished job. The {{ExecutionGraphInfoStore}} 
is used to make information about the finished job available in user-facing 
APIs (REST, web-UI). The JRS is used to expose the job's state to the cleanup 
logic and stores limited data.

This proposal is about merging the two and making the 
{{ArchivedExecutionGraph}} information available even after a JobManager is 
restarted. That way, completed jobs can be still listed in the job overview 
after a Flink cluster restart. Additionally, we could provide the last 
checkpoint information. The JRS would be a way to access this information even 
after the Flink cluster is shut down. The latter feature would be also a way to 
improve the Flink Kubernetes Operator's latest-state handling.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to