Hi Ron and Weihua,

Thanks for the feedback.

There seem three user sensible behaviors that we are talking about:

1. The behavior on the client side, i.e. whether blocking until the job
finishes or not.

2. The behavior of the submitted job, whether stop the job execution if the
client is detached from the Flink cluster, i.e. whether bind the lifecycle
of the job with the connection status of the attached client. For example,
one might want to keep a batch job running until finish even after the
client connection is lost. But it makes sense to stop the job upon client
connection lost if the job invokes collect() on a streaming job.

3. The behavior of the Flink cluster (JM and TMs), whether shutdown the
Flink cluster if the client is detached from the Flink cluster, i.e.
whether bind the cluster lifecycle with the job lifecycle. For dedicated
clusters (application cluster or dedicated session clusters), the lifecycle
of the cluster should be bound with the job lifecycle. But for shared
session clusters, the lifecycle of the Flink cluster should be independent
of the jobs running in it.

As we can see, these three behaviors are sort of independent, the current
configurations fail to support all the combination of wanted behaviors.
Ideally there should be three separate configurations, for example:
- client.attached.after.submission and client.heartbeat.timeout control the
behavior on the client side.
- jobmanager.cancel-on-attached-client-exit controls the behavior of the
job when an attached client lost connection. The client heartbeat timeout
and attach-ness will be also passed to the JM upon job submission.
- cluster.shutdown-on-first-job-finishes *(*or
jobmanager.shutdown-cluster-after-job-finishes) controls the cluster
behavior after the job finishes normally / abnormally. This is a cluster
level setting instead of a job level setting. Therefore it can only be set
when launching the cluster.

The current code sort of combines config 2 and 3 into
execution.shutdown-on-attach-exit.
This assumes the the life cycle of the cluster is the same as the job when
the client is attached. This FLIP does not intend to change that. but using
the execution.attached config for the client behavior control looks
misleading. So this FLIP proposes to replace it with a more intuitive
config of client.attached.after.submission. This makes it clear that it is
a configuration controlling the client side behavior, instead of the
execution of the job.

Thanks,

Jiangjie (Becket) Qin





On Thu, Aug 10, 2023 at 10:34 PM Weihua Hu <huweihua....@gmail.com> wrote:

> Hi Allison
>
> Thanks for driving this FLIP. It's a valuable feature for batch jobs.
> This helps keep "Drop Per-Job Mode [1]" going.
>
> +1 for this proposal.
>
> However, it seems that the change in this FLIP is not detailed enough.
> I have a few questions.
>
> 1. The config 'execution.attached' is not only used in per-job mode,
> but also in session mode to shutdown the cluster. IMHO, it's better to
> keep this option name.
>
> 2. This FLIP only mentions YARN mode. I believe this feature should
> work in both YARN and Kubernetes mode.
>
> 3. Within the attach mode, we support two features:
> execution.shutdown-on-attached-exit
> and client.heartbeat.timeout. These should also be taken into account.
>
> 4. The Application Mode will shut down once the job has been completed.
> So, if we use the flink client to poll job status via REST API for attach
> mode,
> there is a chance that the client will not be able to retrieve the job
> finish status.
> Perhaps FLINK-24113[3] will help with this.
>
>
> [1]https://issues.apache.org/jira/browse/FLINK-26000
> [2]
>
> https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/native_kubernetes/#session-mode
> [2]https://issues.apache.org/jira/browse/FLINK-24113
>
> Best,
> Weihua
>
>
> On Thu, Aug 10, 2023 at 10:47 AM liu ron <ron9....@gmail.com> wrote:
>
> > Hi, Allison
> >
> > Thanks for driving this proposal, it looks cool for batch jobs under
> > application mode. But after reading your FLIP document and [1], I have a
> > question. Why do you want to rename the execution.attached configuration
> to
> > client.attached.after.submission and at the same time deprecate
> > execution.attached? Based on your design, I understand the role of these
> > two options are the same. Introducing a new option would increase the
> cost
> > of understanding and use for the user, so why not follow the idea
> discussed
> > in FLINK-25495 and make Application mode support attached.execution.
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-25495
> >
> > Best,
> > Ron
> >
> > Venkatakrishnan Sowrirajan <vsowr...@asu.edu> 于2023年8月9日周三 02:07写道:
> >
> > > This is definitely a useful feature especially for the flink batch
> > > execution workloads using flow orchestrators like Airflow, Azkaban,
> Oozie
> > > etc. Thanks for reviving this issue and starting a FLIP.
> > >
> > > Regards
> > > Venkata krishnan
> > >
> > >
> > > On Mon, Aug 7, 2023 at 4:09 PM Allison Chang
> > <alch...@linkedin.com.invalid
> > > >
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > I am opening this thread to discuss this proposal to support attached
> > > > execution on Flink Application Completion for Batch Jobs. The link to
> > the
> > > > FLIP proposal is here:
> > > >
> > >
> >
> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/FLINK/FLIP-323*3A*Support*Attached*Execution*on*Flink*Application*Completion*for*Batch*Jobs__;JSsrKysrKysrKys!!IKRxdwAv5BmarQ!friFO6bJub5FKSLhPIzA6kv-7uffv-zXlv9ZLMKqj_xMcmZl62HhsgvwDXSCS5hfSeyHZgoAVSFg3fk7ChaAFNKi$
> > > >
> > > > This FLIP proposes adding back attached execution for Application
> Mode.
> > > In
> > > > the past attached execution was supported for the per-job mode, which
> > > will
> > > > be deprecated and we want to include this feature back into
> Application
> > > > mode.
> > > >
> > > > Please reply to this email thread and share your thoughts/opinions.
> > > >
> > > > Thank you!
> > > >
> > > > Allison Chang
> > > >
> > >
> >
>

Reply via email to