As a quick update: the "pending review" issues have all been resolved.
The open issues are still open: - FLINK-4904: Add a limit for how much data may be spilled in checkpoint alignments => fix pending - FLINK-4910: Introduce safety net for closing file system streams Any updates here? – Ufuk On Fri, Oct 28, 2016 at 5:45 PM, Stefan Richter <s.rich...@data-artisans.com> wrote: > Benefit of a backport, as I see it, is increased stability. The danger is > potentially breaking some code that was casting FileSystems to subtypes like > LocalFileSytem. I don’t know how common that would be in user code. > >> Am 28.10.2016 um 14:27 schrieb Ufuk Celebi <u...@apache.org>: >> >> Thanks for all your feedback. >> >> If there are no objections, I would like to stick to the mentioned >> issues in this thread and create RC1 as soon as they are all >> addressed. This will probably not be this week though, but it looks >> good for next week. >> >> DONE >> ===== >> - FLINK-4619: Answer client if savepoint restore fails >> - FLINK-4715: Safety net for stuck task cancellation >> - FLINK-4510: Always create CheckpointCoordinator >> - FLINK-4894: Don't block on buffer request after broadcast event >> - FLINK-4298: Add proper repository for Closure dependencies >> - FLINK-4218: Do not fail checkpoints when state size cannot be determined >> - FLINK-3347: TaskManager (or its ActorSystem) need to restart in case >> they notice quarantine >> - FLINK-4875: Use correct operator name >> - FLINK-4913: Include user jars in system class loader >> >> PENDING REVIEW >> =============== >> - FLINK-4445: Add option to ignore unmatched state when restoring from >> savepoint => https://github.com/apache/flink/pull/2713 >> - FLINK-4932: Don't let ExecutionGraph fail when in state Restarting >> => https://github.com/apache/flink/pull/2711 >> - FLINK-4933: ExecutionGraph.scheduleOrUpdateConsumers can fail the >> ExecutionGraph => https://github.com/apache/flink/pull/2701 >> >> OPEN >> ===== >> - FLINK-4904: Add a limit for how much data may be spilled in >> checkpoint alignments => fix pending >> - FLINK-4910: Introduce safety net for closing file system streams => >> @Stephan, Stefan: What's the conclusion of your discussion whether to >> backport this or not? >> >> >> On Wed, Oct 26, 2016 at 9:57 PM, dan bress <danbr...@gmail.com> wrote: >>> +1 for this release, >>> also +1 to Chesnay's suggesting for including this: [FLINK-4875] [metrics] >>> Use correct operator name >>> >>> Dan >>> >>> On Wed, Oct 26, 2016 at 5:06 AM Till Rohrmann <trohrm...@apache.org> wrote: >>> >>>> I'll work on FLINK-3347. Additionally I would like to get in >>>> >>>> - https://issues.apache.org/jira/browse/FLINK-4932: Don't let >>>> ExecutionGraph fail when in state Restarting >>>> - https://issues.apache.org/jira/browse/FLINK-4933: >>>> ExecutionGraph.scheduleOrUpdateConsumers >>>> can fail the ExecutionGraph >>>> >>>> Cheers, >>>> Till >>>> >>>> On Wed, Oct 26, 2016 at 1:02 PM, Stephan Ewen <se...@apache.org> wrote: >>>> >>>>> Concerning backporting the "I/O streams safety net" - we need to make >>>> sure >>>>> that this does not change any behavior that users may implicitly expect. >>>>> >>>>> >>>>> On Wed, Oct 26, 2016 at 11:21 AM, Maximilian Michels <m...@apache.org> >>>>> wrote: >>>>> >>>>>> +1 for a 1.1.4 release >>>>>> >>>>>> We could backport putting user jars into the system class loader for >>>>>> per-job Yarn clusters: https://github.com/apache/flink/pull/2692 >>>>>> Arguably, this is somewhat a new feature but it gets rid of duplicate >>>>>> class loading issues users experienced in practice. >>>>>> >>>>>> We already have the following commits on the release-1.1 branch: >>>>>> >>>>>> 05a5f46 [FLINK-4862] fix Timer register in ContinuousEventTimeTrigger >>>>>> 5731672 [FLINK-4581] [table] Fix Table API throwing "No suitable driver >>>>>> found for jdbc:calcite" >>>>>> 9c87f92 [FLINK-4586] [core] Broken AverageAccumulator >>>>>> 210230c [FLINK-4829] snapshot accumulators on a best-effort basis >>>>>> c1d6b24 [FLINK-4829] protect user accumulators against concurrent >>>> updates >>>>>> fe464b4 [FLINK-4709] [core] Fix resource leak in >>>>> InputStreamFSInputWrapper >>>>>> 9f72698 [FLINK-4108] [scala] Respect ResultTypeQueryable for >>>>> InputFormats. >>>>>> 9591d50 [FLINK-4506] [DataSet] Fix documentation of CsvOutputFormat >>>> about >>>>>> incorrect default of allowNullValues >>>>>> c9433bf [FLINK-3706] Fix YARN test instability >>>>>> 2203f74 [FLINK-4778] [docs] Fix WordCount parameters in CLI examples. >>>>>> >>>>>> -Max >>>>>> >>>>>> >>>>>> On Wed, Oct 26, 2016 at 7:05 AM, Jean-Baptiste Onofré <j...@nanthrax.net >>>>> >>>>>> wrote: >>>>>>> +1 >>>>>>> >>>>>>> Looking forward this release ! >>>>>>> >>>>>>> Regards >>>>>>> JB >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Oct 25, 2016, 14:43, at 14:43, Robert Metzger < >>>> rmetz...@apache.org> >>>>>> wrote: >>>>>>>> +1 for a bugfix release soon. >>>>>>>> >>>>>>>> On Tue, Oct 25, 2016 at 10:53 AM, Stephan Ewen <se...@apache.org> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Thanks fort starting this Ufuk. >>>>>>>>> >>>>>>>>> I would like to add the following issues to 1.1.4: >>>>>>>>> >>>>>>>>> Build errors due to Storm dependencies *(fix pending)* >>>>>>>>> - [FLINK-4298] [storm compatibility] Add proper repository for >>>>>>>> Closure >>>>>>>>> dependencies. >>>>>>>>> >>>>>>>>> Stability on S3 considering eventual consistency *(fix pending)* >>>>>>>>> - [FLINK-4218] [checkpoints] Do not fail checkpoints when state >>>>>>>> size >>>>>>>>> cannot be determined >>>>>>>>> >>>>>>>>> Avoiding Zombie TaskManagers *(still needs to be done)* >>>>>>>>> - [FLINK-3347] [akka] TaskManager (or its ActorSystem) need to >>>>>>>> restart >>>>>>>>> in case they notice quarantine >>>>>>>>> >>>>>>>>> Adding a limit to the amount of data spilled during checkpoint >>>>>>>> alignments >>>>>>>>> *(fix >>>>>>>>> is work in progress)* >>>>>>>>> - [FLINK-4904] [checkpoints] Add a limit for how much data may >>>> be >>>>>>>>> spilled in checkpoint alignments >>>>>>>>> >>>>>>>>> >>>>>>>>> I can push the first two fixes to the 1.1.4 branch in a bit, the >>>>>>>> fourth one >>>>>>>>> later today. >>>>>>>>> The third one (akka) is still pending. >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> Stephan >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Oct 24, 2016 at 3:32 PM, Ufuk Celebi <u...@apache.org> >>>> wrote: >>>>>>>>> >>>>>>>>>> Hey all, >>>>>>>>>> >>>>>>>>>> I would like to start the discussion for kicking off the next bug >>>>>>>> fix >>>>>>>>>> release, Flink 1.1.4. What do you think about aiming for a RC by >>>>>>>> end >>>>>>>>>> of this week? >>>>>>>>>> >>>>>>>>>> Users reported some instabilities/inconveniences that would be >>>> good >>>>>>>> to >>>>>>>>> fix. >>>>>>>>>> >>>>>>>>>> Personally, I would like to backport the following fixes: >>>>>>>>>> >>>>>>>>>> (1) https://issues.apache.org/jira/browse/FLINK-4619: Answer >>>>> client >>>>>>>> if >>>>>>>>>> savepoint restore fails (Already merged for master, needs minimal >>>>>>>>>> adjustment for 1.1) >>>>>>>>>> (2) https://issues.apache.org/jira/browse/FLINK-4715: Safety net >>>>>>>> for >>>>>>>>>> stuck task cancellation (Already reviewed for master, waiting for >>>>>>>>>> tests to finish of backport) >>>>>>>>>> (3) https://issues.apache.org/jira/browse/FLINK-4510: Always >>>>> create >>>>>>>>>> CheckpointCoordinator (Already merged for master, needs minimal >>>>>>>>>> adjustments for 1.1) >>>>>>>>>> >>>>>>>>>> Furthermore, I would like to address the following: >>>>>>>>>> >>>>>>>>>> (4) https://issues.apache.org/jira/browse/FLINK-4445: Add option >>>>> to >>>>>>>>>> ignore unmatched state when restoring from savepoint >>>>>>>>>> (5) https://issues.apache.org/jira/browse/FLINK-4894: Don't >>>> block >>>>>>>> on >>>>>>>>>> buffer request after broadcast event >>>>>>>>>> >>>>>>>>>> Strictly speaking, the (4) is not a bug fix. But given that it >>>>>>>> would >>>>>>>>>> only add an optional flag to savepoint restoring and should have >>>>>>>> been >>>>>>>>>> addressed for 1.1.0 already, I would like to get it in. >>>>>>>>>> >>>>>>>>> >>>>>> >>>>> >>>> >