> also notice that the exception causing a restart is no longer displayed > in the UI, which is probably related?
Yes, this is also related to the new scheduler. I created FLINK-15917 [1] to track this. Moreover, I created a ticket about the uptime metric not resetting [2]. Both issues already exist in 1.9 if "jobmanager.execution.failover-strategy" is set to "region", which is the case in the default flink-conf.yaml. In 1.9, unsetting "jobmanager.execution.failover-strategy" was enough to fall back to the previous behavior. In 1.10, you can still fall back to the previous behavior by setting "jobmanager.scheduler: legacy" and unsetting "jobmanager.execution.failover-strategy" in your flink-conf.yaml I would not consider these issues blockers since there is a workaround for them, but of course we would like to see the new scheduler getting some production exposure. More detailed release notes about the caveats of the new scheduler will be added to the user documentation. > The watermark issue was https://issues.apache.org/jira/browse/FLINK-14470 This should be fixed now [3]. [1] https://issues.apache.org/jira/browse/FLINK-15917 [2] https://issues.apache.org/jira/browse/FLINK-15918 [3] https://issues.apache.org/jira/browse/FLINK-8949 On Wed, Feb 5, 2020 at 7:04 AM Thomas Weise <t...@apache.org> wrote: > Hi Gary, > > Thanks for the reply. > > --> > > On Tue, Feb 4, 2020 at 5:20 AM Gary Yao <g...@apache.org> wrote: > > > Hi Thomas, > > > > > 2) Was there a change in how job recovery reflects in the uptime > metric? > > > Didn't uptime previously reset to 0 on recovery (now it just keeps > > > increasing)? > > > > The uptime is the difference between the current time and the time when > the > > job transitioned to RUNNING state. By default we no longer transition the > > job > > out of the RUNNING state when restarting. This has something to do with > the > > new scheduler which enables pipelined region failover by default [1]. > > Actually > > we enabled pipelined region failover already in the binary distribution > of > > Flink 1.9 by setting: > > > > jobmanager.execution.failover-strategy: region > > > > in the default flink-conf.yaml. Unless you have removed this config > option > > or > > you are using a custom yaml, you should be seeing this behavior in Flink > > 1.9. > > If you do not want region failover, set > > > > jobmanager.execution.failover-strategy: full > > > > > We are using the default (the jobmanager.execution.failover-strategy > setting is not present in our flink config). > > The change in behavior I see is between the 1.9 based deployment and the > 1.10 RC. > > Our 1.9 branch is here: > https://github.com/lyft/flink/tree/release-1.9-lyft > > I also notice that the exception causing a restart is no longer displayed > in the UI, which is probably related? > > > > > > > 1) Is the low watermark display in the UI still broken? > > > > I was not aware that this is broken. Is there an issue tracking this bug? > > > > The watermark issue was https://issues.apache.org/jira/browse/FLINK-14470 > > (I don't have a good way to verify it is fixed at the moment.) > > Another problem with this 1.10 RC is that the checkpointAlignmentTime > metric is missing. (I have not been able to investigate this further yet.) > > > > > > Best, > > Gary > > > > [1] https://issues.apache.org/jira/browse/FLINK-14651 > > > > On Tue, Feb 4, 2020 at 2:56 AM Thomas Weise <t...@apache.org> wrote: > > > >> I opened a PR for FLINK-15868 > >> <https://issues.apache.org/jira/browse/FLINK-15868>: > >> https://github.com/apache/flink/pull/11006 > >> > >> With that change, I was able to run an application that consumes from > >> Kinesis. > >> > >> I should have data tomorrow regarding the performance. > >> > >> Two questions/observations: > >> > >> 1) Is the low watermark display in the UI still broken? > >> 2) Was there a change in how job recovery reflects in the uptime metric? > >> Didn't uptime previously reset to 0 on recovery (now it just keeps > >> increasing)? > >> > >> Thanks, > >> Thomas > >> > >> > >> > >> > >> On Mon, Feb 3, 2020 at 10:55 AM Thomas Weise <t...@apache.org> wrote: > >> > >> > I found another issue with the Kinesis connector: > >> > > >> > https://issues.apache.org/jira/browse/FLINK-15868 > >> > > >> > > >> > On Mon, Feb 3, 2020 at 3:35 AM Gary Yao <g...@apache.org> wrote: > >> > > >> >> Hi everyone, > >> >> > >> >> I am hereby canceling the vote due to: > >> >> > >> >> FLINK-15837 > >> >> FLINK-15840 > >> >> > >> >> Another RC will be created later today. > >> >> > >> >> Best, > >> >> Gary > >> >> > >> >> On Mon, Jan 27, 2020 at 10:06 PM Gary Yao <g...@apache.org> wrote: > >> >> > >> >> > Hi everyone, > >> >> > Please review and vote on the release candidate #1 for the version > >> >> 1.10.0, > >> >> > as follows: > >> >> > [ ] +1, Approve the release > >> >> > [ ] -1, Do not approve the release (please provide specific > comments) > >> >> > > >> >> > > >> >> > The complete staging area is available for your review, which > >> includes: > >> >> > * JIRA release notes [1], > >> >> > * the official Apache source release and binary convenience > releases > >> to > >> >> be > >> >> > deployed to dist.apache.org [2], which are signed with the key > with > >> >> > fingerprint BB137807CEFBE7DD2616556710B12A1F89C115E8 [3], > >> >> > * all artifacts to be deployed to the Maven Central Repository [4], > >> >> > * source code tag "release-1.10.0-rc1" [5], > >> >> > > >> >> > The announcement blog post is in the works. I will update this > voting > >> >> > thread with a link to the pull request soon. > >> >> > > >> >> > The vote will be open for at least 72 hours. It is adopted by > >> majority > >> >> > approval, with at least 3 PMC affirmative votes. > >> >> > > >> >> > Thanks, > >> >> > Yu & Gary > >> >> > > >> >> > [1] > >> >> > > >> >> > >> > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12345845 > >> >> > [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.10.0-rc1/ > >> >> > [3] https://dist.apache.org/repos/dist/release/flink/KEYS > >> >> > [4] > >> >> > https://repository.apache.org/content/repositories/orgapacheflink-1325 > >> >> > [5] > https://github.com/apache/flink/releases/tag/release-1.10.0-rc1 > >> >> > > >> >> > >> > > >> > > >