Re: [DISCUSS] Releasing Flink 1.1.4

Ufuk Celebi Wed, 02 Nov 2016 03:44:21 -0700

As a quick update: the "pending review" issues have all been resolved.


The open issues are still open:

- FLINK-4904: Add a limit for how much data may be spilled in
checkpoint alignments => fix pending
- FLINK-4910: Introduce safety net for closing file system streams

Any updates here?

– Ufuk


On Fri, Oct 28, 2016 at 5:45 PM, Stefan Richter
<s.rich...@data-artisans.com> wrote:
> Benefit of a backport, as I see it, is increased stability. The danger is 
> potentially breaking some code that was casting FileSystems to subtypes like 
> LocalFileSytem. I don’t know how common that would be in user code.
>
>> Am 28.10.2016 um 14:27 schrieb Ufuk Celebi <u...@apache.org>:
>>
>> Thanks for all your feedback.
>>
>> If there are no objections, I would like to stick to the mentioned
>> issues in this thread and create RC1 as soon as they are all
>> addressed. This will probably not be this week though, but it looks
>> good for next week.
>>
>> DONE
>> =====
>> - FLINK-4619: Answer client if savepoint restore fails
>> - FLINK-4715: Safety net for stuck task cancellation
>> - FLINK-4510: Always create CheckpointCoordinator
>> - FLINK-4894: Don't block on buffer request after broadcast event
>> - FLINK-4298: Add proper repository for Closure dependencies
>> - FLINK-4218: Do not fail checkpoints when state size cannot be determined
>> - FLINK-3347: TaskManager (or its ActorSystem) need to restart in case
>> they notice quarantine
>> - FLINK-4875: Use correct operator name
>> - FLINK-4913: Include user jars in system class loader
>>
>> PENDING REVIEW
>> ===============
>> - FLINK-4445: Add option to ignore unmatched state when restoring from
>> savepoint => https://github.com/apache/flink/pull/2713
>> - FLINK-4932: Don't let ExecutionGraph fail when in state Restarting
>> => https://github.com/apache/flink/pull/2711
>> - FLINK-4933: ExecutionGraph.scheduleOrUpdateConsumers can fail the
>> ExecutionGraph => https://github.com/apache/flink/pull/2701
>>
>> OPEN
>> =====
>> - FLINK-4904: Add a limit for how much data may be spilled in
>> checkpoint alignments => fix pending
>> - FLINK-4910: Introduce safety net for closing file system streams =>
>> @Stephan, Stefan: What's the conclusion of your discussion whether to
>> backport this or not?
>>
>>
>> On Wed, Oct 26, 2016 at 9:57 PM, dan bress <danbr...@gmail.com> wrote:
>>> +1 for this release,
>>> also +1 to Chesnay's suggesting for including this: [FLINK-4875] [metrics]
>>> Use correct operator name
>>>
>>> Dan
>>>
>>> On Wed, Oct 26, 2016 at 5:06 AM Till Rohrmann <trohrm...@apache.org> wrote:
>>>
>>>> I'll work on FLINK-3347. Additionally I would like to get in
>>>>
>>>> - https://issues.apache.org/jira/browse/FLINK-4932: Don't let
>>>> ExecutionGraph fail when in state Restarting
>>>> - https://issues.apache.org/jira/browse/FLINK-4933:
>>>> ExecutionGraph.scheduleOrUpdateConsumers
>>>> can fail the ExecutionGraph
>>>>
>>>> Cheers,
>>>> Till
>>>>
>>>> On Wed, Oct 26, 2016 at 1:02 PM, Stephan Ewen <se...@apache.org> wrote:
>>>>
>>>>> Concerning backporting the "I/O streams safety net" - we need to make
>>>> sure
>>>>> that this does not change any behavior that users may implicitly expect.
>>>>>
>>>>>
>>>>> On Wed, Oct 26, 2016 at 11:21 AM, Maximilian Michels <m...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> +1 for a 1.1.4 release
>>>>>>
>>>>>> We could backport putting user jars into the system class loader for
>>>>>> per-job Yarn clusters: https://github.com/apache/flink/pull/2692
>>>>>> Arguably, this is somewhat a new feature but it gets rid of duplicate
>>>>>> class loading issues users experienced in practice.
>>>>>>
>>>>>> We already have the following commits on the release-1.1 branch:
>>>>>>
>>>>>> 05a5f46 [FLINK-4862] fix Timer register in ContinuousEventTimeTrigger
>>>>>> 5731672 [FLINK-4581] [table] Fix Table API throwing "No suitable driver
>>>>>> found for jdbc:calcite"
>>>>>> 9c87f92 [FLINK-4586] [core] Broken AverageAccumulator
>>>>>> 210230c [FLINK-4829] snapshot accumulators on a best-effort basis
>>>>>> c1d6b24 [FLINK-4829] protect user accumulators against concurrent
>>>> updates
>>>>>> fe464b4 [FLINK-4709] [core] Fix resource leak in
>>>>> InputStreamFSInputWrapper
>>>>>> 9f72698 [FLINK-4108] [scala] Respect ResultTypeQueryable for
>>>>> InputFormats.
>>>>>> 9591d50 [FLINK-4506] [DataSet] Fix documentation of CsvOutputFormat
>>>> about
>>>>>> incorrect default of allowNullValues
>>>>>> c9433bf [FLINK-3706] Fix YARN test instability
>>>>>> 2203f74 [FLINK-4778] [docs] Fix WordCount parameters in CLI examples.
>>>>>>
>>>>>> -Max
>>>>>>
>>>>>>
>>>>>> On Wed, Oct 26, 2016 at 7:05 AM, Jean-Baptiste Onofré <j...@nanthrax.net
>>>>>
>>>>>> wrote:
>>>>>>> +1
>>>>>>>
>>>>>>> Looking forward this release !
>>>>>>>
>>>>>>> Regards
>>>>>>> JB
>>>>>>>
>>>>>>> ⁣
>>>>>>>
>>>>>>> On Oct 25, 2016, 14:43, at 14:43, Robert Metzger <
>>>> rmetz...@apache.org>
>>>>>> wrote:
>>>>>>>> +1 for a bugfix release soon.
>>>>>>>>
>>>>>>>> On Tue, Oct 25, 2016 at 10:53 AM, Stephan Ewen <se...@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thanks fort starting this Ufuk.
>>>>>>>>>
>>>>>>>>> I would like to add the following issues to 1.1.4:
>>>>>>>>>
>>>>>>>>> Build errors due to Storm dependencies *(fix pending)*
>>>>>>>>>    - [FLINK-4298] [storm compatibility] Add proper repository for
>>>>>>>> Closure
>>>>>>>>> dependencies.
>>>>>>>>>
>>>>>>>>> Stability on S3 considering eventual consistency *(fix pending)*
>>>>>>>>>    - [FLINK-4218] [checkpoints] Do not fail checkpoints when state
>>>>>>>> size
>>>>>>>>> cannot be determined
>>>>>>>>>
>>>>>>>>> Avoiding Zombie TaskManagers *(still needs to be done)*
>>>>>>>>>    - [FLINK-3347] [akka] TaskManager (or its ActorSystem) need to
>>>>>>>> restart
>>>>>>>>> in case they notice quarantine
>>>>>>>>>
>>>>>>>>> Adding a limit to the amount of data spilled during checkpoint
>>>>>>>> alignments
>>>>>>>>> *(fix
>>>>>>>>> is work in progress)*
>>>>>>>>>    - [FLINK-4904] [checkpoints] Add a limit for how much data may
>>>> be
>>>>>>>>> spilled in checkpoint alignments
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I can push the first two fixes to the 1.1.4 branch in a bit, the
>>>>>>>> fourth one
>>>>>>>>> later today.
>>>>>>>>> The third one (akka) is still pending.
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Stephan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Oct 24, 2016 at 3:32 PM, Ufuk Celebi <u...@apache.org>
>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hey all,
>>>>>>>>>>
>>>>>>>>>> I would like to start the discussion for kicking off the next bug
>>>>>>>> fix
>>>>>>>>>> release, Flink 1.1.4. What do you think about aiming for a RC by
>>>>>>>> end
>>>>>>>>>> of this week?
>>>>>>>>>>
>>>>>>>>>> Users reported some instabilities/inconveniences that would be
>>>> good
>>>>>>>> to
>>>>>>>>> fix.
>>>>>>>>>>
>>>>>>>>>> Personally, I would like to backport the following fixes:
>>>>>>>>>>
>>>>>>>>>> (1) https://issues.apache.org/jira/browse/FLINK-4619: Answer
>>>>> client
>>>>>>>> if
>>>>>>>>>> savepoint restore fails (Already merged for master, needs minimal
>>>>>>>>>> adjustment for 1.1)
>>>>>>>>>> (2) https://issues.apache.org/jira/browse/FLINK-4715: Safety net
>>>>>>>> for
>>>>>>>>>> stuck task cancellation (Already reviewed for master, waiting for
>>>>>>>>>> tests to finish of backport)
>>>>>>>>>> (3) https://issues.apache.org/jira/browse/FLINK-4510: Always
>>>>> create
>>>>>>>>>> CheckpointCoordinator (Already merged for master, needs minimal
>>>>>>>>>> adjustments for 1.1)
>>>>>>>>>>
>>>>>>>>>> Furthermore, I would like to address the following:
>>>>>>>>>>
>>>>>>>>>> (4) https://issues.apache.org/jira/browse/FLINK-4445: Add option
>>>>> to
>>>>>>>>>> ignore unmatched state when restoring from savepoint
>>>>>>>>>> (5) https://issues.apache.org/jira/browse/FLINK-4894: Don't
>>>> block
>>>>>>>> on
>>>>>>>>>> buffer request after broadcast event
>>>>>>>>>>
>>>>>>>>>> Strictly speaking, the (4) is not a bug fix. But given that it
>>>>>>>> would
>>>>>>>>>> only add an optional flag to savepoint restoring and should have
>>>>>>>> been
>>>>>>>>>> addressed for 1.1.0 already, I would like to get it in.
>>>>>>>>>>
>>>>>>>>>
>>>>>>
>>>>>
>>>>
>

Re: [DISCUSS] Releasing Flink 1.1.4

Reply via email to