[ 
https://issues.apache.org/jira/browse/FLINK-32667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17752729#comment-17752729
 ] 

Matthias Pohl commented on FLINK-32667:
---------------------------------------

I see [~chesnay]'s concern that the restart strategy is not really suited 
conceptually to be used for limiting the job's HA capabilities. The HA topic is 
more of a cluster configuration feature rather than a per-job feature. That's 
why it also feels odd to combine the cluster configuration with the job 
configuration when collecting the filter mechanism in the {{JobGraphStore}} 
implementation, you're proposing in the attached PR.

About your comment on the HA services:
{quote}
Part1: Flink cluster will register its dispatcher address, rest port to ha 
service such as zk or configmap for k8s, then the client can get these 
information and submit job to dispatcher via rest, this is needed in OLAP 
scenario

Part2: Dispatcher will validate, save and recover job from JobGraphStore and 
JobResultStore requires failover.
{quote}
Your observattion is correct here: The {{HighAvailabilityServices}} could be 
structured in a better way. LeaderElection (your part 1) is used to maintain 
the right state of the Flink cluster whereas job-related HA data (your part 2) 
is used to ensure that a job can be recovered. Essentially, those are two 
different features which should be reflected in separate interfaces rather than 
a single one (i.e. {{HighAvailabilityServices}}). 

I put some thoughts into reorganizing the {{HighAvailabilityServices}} 
interface as part of the recent leader election work (see [this FLINK-31816 
comment|https://issues.apache.org/jira/browse/FLINK-31816?focusedCommentId=17741054&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17741054]).
 I didn't proceed with it because I didn't see much value other than slightly 
cleaner code at that time. But such a refactoring would allow us to separate 
the two topics for the HA backend.

But I have to point out: That approach would be different to what you had in 
mind with analyzing the restart strategy. It would work on a cluster level 
rather on a per-job level (in contrast to what you had in mind with analyzing 
the restart strategy).

> Use standalone store and embedding writer for jobs with no-restart-strategy 
> in session cluster
> ----------------------------------------------------------------------------------------------
>
>                 Key: FLINK-32667
>                 URL: https://issues.apache.org/jira/browse/FLINK-32667
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Coordination
>    Affects Versions: 1.18.0
>            Reporter: Fang Yong
>            Assignee: Fang Yong
>            Priority: Major
>              Labels: pull-request-available
>
> When a flink session cluster use zk or k8s high availability service, it will 
> store jobs in zk or ConfigMap. When we submit flink olap jobs to the session 
> cluster, they always turn off restart strategy. These jobs with 
> no-restart-strategy should not be stored in zk or ConfigMap in k8s



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to