Hi,

Thanks for the comments.
I agree with the Ufuk's and Elias' proposal.

- "cancel" remains the good old "cancel"
- "terminate" becomes "stop --drain-with-savepoint"
- "suspend" becomes "stop --with-savepoint"
- "cancel-with-savepoint" is subsumed by "stop --with-savepoint"

As you see from the previous, I would also add "terminate" and "suspend"
to result in keeping a savepoint by default.

As for Ufuk's remarks:

1) You are correct that to have a proper way to not allow elements to be
fed in the pipeline
after the checkpoint barrier, we need support from the sources. This is
more the responsibility
of FLIP-27
https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface

2) I would lean more towards replacing the old "stop" command with the new
one. But, as you said,
I have no view of how many users (if any) rely on the old "stop" command
for their usecases.

Cheers,
Kostas



On Wed, Mar 6, 2019 at 9:52 PM Ufuk Celebi <u...@apache.org> wrote:

> I really like this effort. I think the original plan for
> "cancel-with-savepoint" was always to just be a workaround until we
> arrived at a better solution as proposed here.
>
> Regarding the FLIP, I agree with Elias comments. I think the number of
> termination modes the FLIP introduces can be overwhelming and I would
> personally rather follow Elias' proposal. In context of the proposal,
> this would result in the following:
> - "terminate" becomes "stop --drain"
> - "suspend" becomes "stop --with-savepoint"
> - "cancel-with-savepoint" is superseded by "stop --with-savepoint"
>
> I have two remaining questions:
>
> 1) @Kostas: Elias suggests for stop that "a job should process no
> messages after the checkpoints barrier". This is something that needs
> support from the sources. Is this in the scope of your proposal (I
> think not)? If not, is there a future plan for this?
>
> 2) Would we need to introduce a new command/name for "stop" as we
> already have a "stop" command? Assuming that there are no users that
> actually use the existing "stop" command as no major sources are
> stoppable (I think), I would personally suggest to upgrade the
> existing "stop" command to the proposed one. If on the other hand, if
> we know of users that rely on the current "stop" command, we'd need to
> find another name for it.
>
> Best,
>
> Ufuk
>
> On Wed, Mar 6, 2019 at 12:27 AM Elias Levy <fearsome.lucid...@gmail.com>
> wrote:
> >
> > Apologies for the late reply.
> >
> > I think this is badly needed, but I fear we are adding complexity by
> > introducing yet two more stop commands.  We'll have: cancel, stop,
> > terminate. and suspend.  We basically want to do two things: terminate a
> > job with prejudice or stop a job safely.
> >
> > For the former "cancel" is the appropriate term, and should have no need
> > for a cancel with checkpoint option.  If the job was configured to use
> > externalized checkpoints and it ran long enough, a checkpoint will be
> > available for it.
> >
> > For the later "stop" is the appropriate term, and it means that a job
> > should process no messages after the checkpoints barrier and that it
> should
> > ensure that exactly-once sinks complete their two-phase commits
> > successfully.  If a savepoint was requested, one should be created.
> >
> > So in my mind there are two commands, cancel and stop, with appropriate
> > semantics.  Emitting MAX_WATERMARK before the checkpoint barrier during
> > stop is merely an optional behavior, like creation of a savepoint.  But
> if
> > a specific command for it is desired, then "drain" seems appropriate.
> >
> > On Tue, Feb 12, 2019 at 9:50 AM Stephan Ewen <se...@apache.org> wrote:
> >
> > > Hi Elias!
> > >
> > > I remember you brought this missing feature up in the past. Do you
> think
> > > the proposed enhancement would work for your use case?
> > >
> > > Best,
> > > Stephan
> > >
> > > ---------- Forwarded message ---------
> > > From: Kostas Kloudas <k.klou...@ververica.com>
> > > Date: Tue, Feb 12, 2019 at 5:28 PM
> > > Subject: [DISCUSS] FLIP-33: Terminate/Suspend Job with Savepoint
> > > To: <dev@flink.apache.org>
> > >
> > >
> > > Hi everyone,
> > >
> > >  A commonly used functionality offered by Flink is the
> > > "cancel-with-savepoint" operation. When applied to the current
> exactly-once
> > > sinks, the current implementation of the feature can be problematic,
> as it
> > > does not guarantee that side-effects will be committed by Flink to the
> 3rd
> > > party storage system.
> > >
> > >  This discussion targets fixing this issue and proposes the addition
> of two
> > > termination modes, namely:
> > >     1) SUSPEND, for temporarily stopping the job, e.g. for Flink
> version
> > > upgrading in your cluster
> > >     2) TERMINATE, for terminal shut down which ends the stream and
> sends
> > > MAX_WATERMARK time, and flushes any state associated with (event time)
> > > timers
> > >
> > > A google doc with the FLIP proposal can be found here:
> > >
> > >
> https://docs.google.com/document/d/1EZf6pJMvqh_HeBCaUOnhLUr9JmkhfPgn6Mre_z6tgp8/edit?usp=sharing
> > >
> > > And the page for the FLIP is here:
> > >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103090212
> > >
> > >  The implementation sketch is far from complete, but it is worth
> having a
> > > discussion on the semantics as soon as possible. The implementation
> section
> > > is going to be updated soon.
> > >
> > >  Looking forward to the discussion,
> > >  Kostas
> > >
> > > --
> > >
> > > Kostas Kloudas | Software Engineer
> > >
> > >
> > > <https://www.ververica.com/>
> > >
> > > Follow us @VervericaData
> > >
> > > --
> > >
> > > Join Flink Forward <https://flink-forward.org/> - The Apache Flink
> > > Conference
> > >
> > > Stream Processing | Event Driven | Real Time
> > >
> > > --
> > >
> > > Data Artisans GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
> > >
> > > --
> > > Data Artisans GmbH
> > > Registered at Amtsgericht Charlottenburg: HRB 158244 B
> > > Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen
> > >
>

Reply via email to