Hi Matthias!

Thank you for the detailed proposal, overall I am in favor of making this
unification to simplify the logic and make the integration for external
components more straightforward.
I will try to read through the proposal more carefully next week and
provide some detailed feedback.

+1

Thanks
Gyula

On Fri, Sep 8, 2023 at 8:36 AM Matthias Pohl <matthias.p...@aiven.io.invalid>
wrote:

> Just a bit more elaboration on the question that we need to answer here: Do
> we want to expose the internal ArchivedExecutionGraph data structure
> through JSON?
>
> - The JSON approach allows the user to have (almost) full access to the
> information (that would be otherwise derived from the REST API). Therefore,
> there's no need to spin up a cluster to access this information.
> Any information that shall be exposed through the REST API needs to be
> well-defined in this JSON structure, though. Large parts of the
> ArchivedExecutionGraph data structure (essentially anything that shall be
> used to populate the REST API) become public domain, though, which puts
> more constraints on this data structure and makes it harder to change it in
> the future.
>
> - The binary data approach allows us to keep the data structure itself
> internal. We have more control over what we want to expose by providing
> access points in the ClusterClient (e.g. just add a command to extract the
> external storage path from the file).
>
> - The compromise (i.e. keeping ExecutionGraphInfoStore and JobResultStore
> separate and just expose the checkpoint information next to the JobResult
> in the JobResultStore file) would keep us the closest to the current state,
> requires the least code changes and the least exposure of internal data
> structures. It would allow any system (like the Kubernetes Operator) to
> extract the checkpoint's external storage path. But we would still be stuck
> with kind-of redundant components.
>
> From a user's perspective, I feel like the JSON approach is the best one
> because it gives him/her the most freedom to be independent of Flink
> binaries when handling completed jobs. But I see benefits from a Flink
> developer's perspective to not expose the entire data structure but use the
> ClusterClient as an access point.
>
> The last option is my least favorite one: Moving the ExecutionGraphInfo out
> of the JobManager seems to be the right thing to do when thinking about
> Flink's vision to become cloud-native.
>
> Just my 2cts on that topic.
> Matthias
>
> On Mon, Sep 4, 2023 at 1:11 PM Matthias Pohl <matthias.p...@aiven.io>
> wrote:
>
> > Hi everyone,
> > I want to open the discussion on FLIP-360 [1]. The goal of this FLIP is
> to
> > combine the two very similar components ExecutionGraphInfoStore and
> > JobResultStore into a single component.
> >
> > The benefit of this effort would be to expose the metadata of a
> > globally-terminated job even in cases where the JobManager fails shortly
> > after the job finished. This is relevant for external checkpoint
> management
> > (like it's done in the Kubernetes Operator) which relies on the
> checkpoint
> > information to be available.
> >
> > More generally, it would allow completed jobs to be listed as part of the
> > Flink cluster even after a JM failover. This would allow users to gain
> more
> > control over finished jobs.
> >
> > The current state of the FLIP doesn't come up with a final conclusion on
> > the serialization format of the data (JSON vs binary). I want to
> emphasize
> > that there's also a third option which keeps both components separate and
> > only exposes the additional checkpoint information through the
> > JobResultStore.
> >
> > I'm looking forward to feedback.
> > Best,
> > Matthias
> >
> > PS: I might be less responsive in the next 2-3 weeks but want to initiate
> > the discussion, anyway.
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-360%3A+Merging+the+ExecutionGraphInfoStore+and+the+JobResultStore+into+a+single+component+CompletedJobStore
> >
>

Reply via email to