[
https://issues.apache.org/jira/browse/FLINK-32667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17752729#comment-17752729
]
Matthias Pohl commented on FLINK-32667:
---------------------------------------
I see [~chesnay]'s concern that the restart strategy is not really suited
conceptually to be used for limiting the job's HA capabilities. The HA topic is
more of a cluster configuration feature rather than a per-job feature. That's
why it also feels odd to combine the cluster configuration with the job
configuration when collecting the filter mechanism in the {{JobGraphStore}}
implementation, you're proposing in the attached PR.
About your comment on the HA services:
{quote}
Part1: Flink cluster will register its dispatcher address, rest port to ha
service such as zk or configmap for k8s, then the client can get these
information and submit job to dispatcher via rest, this is needed in OLAP
scenario
Part2: Dispatcher will validate, save and recover job from JobGraphStore and
JobResultStore requires failover.
{quote}
Your observattion is correct here: The {{HighAvailabilityServices}} could be
structured in a better way. LeaderElection (your part 1) is used to maintain
the right state of the Flink cluster whereas job-related HA data (your part 2)
is used to ensure that a job can be recovered. Essentially, those are two
different features which should be reflected in separate interfaces rather than
a single one (i.e. {{HighAvailabilityServices}}).
I put some thoughts into reorganizing the {{HighAvailabilityServices}}
interface as part of the recent leader election work (see [this FLINK-31816
comment|https://issues.apache.org/jira/browse/FLINK-31816?focusedCommentId=17741054&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17741054]).
I didn't proceed with it because I didn't see much value other than slightly
cleaner code at that time. But such a refactoring would allow us to separate
the two topics for the HA backend.
But I have to point out: That approach would be different to what you had in
mind with analyzing the restart strategy. It would work on a cluster level
rather on a per-job level (in contrast to what you had in mind with analyzing
the restart strategy).
> Use standalone store and embedding writer for jobs with no-restart-strategy
> in session cluster
> ----------------------------------------------------------------------------------------------
>
> Key: FLINK-32667
> URL: https://issues.apache.org/jira/browse/FLINK-32667
> Project: Flink
> Issue Type: Sub-task
> Components: Runtime / Coordination
> Affects Versions: 1.18.0
> Reporter: Fang Yong
> Assignee: Fang Yong
> Priority: Major
> Labels: pull-request-available
>
> When a flink session cluster use zk or k8s high availability service, it will
> store jobs in zk or ConfigMap. When we submit flink olap jobs to the session
> cluster, they always turn off restart strategy. These jobs with
> no-restart-strategy should not be stored in zk or ConfigMap in k8s
--
This message was sent by Atlassian Jira
(v8.20.10#820010)