Re: [DISCUSS] Release Flink 1.1.5 / Flink 1.2.1

Ufuk Celebi Fri, 17 Mar 2017 00:13:20 -0700

Cool! Thanks for taking care of this Gordon :-)


On Fri, Mar 17, 2017 at 7:13 AM, Tzu-Li (Gordon) Tai
<[email protected]> wrote:
> Update for 1.1.5:
> The last fixes for 1.1.5 are in! I will create the RC today and start the 
> vote.
>
> Cheers,
> Gordon
>
>
> On March 17, 2017 at 1:14:53 AM, Robert Metzger ([email protected]) wrote:
>
> The cassandra connector is probably not usable in Flink 1.2.0. I would like
> to include a fix in 1.2.1:
> https://issues.apache.org/jira/browse/FLINK-6084
>
> Please let me know if this fix becomes a blocker for the 1.2.1 release. If
> so, I can validate the fix myself to speed up things.
>
> On Thu, Mar 16, 2017 at 9:41 AM, Jinkui Shi <[email protected]> wrote:
>
>> @Tzu-li(Fordon)Tai
>>
>> FLINK-5650 is fix by [1]. Chesnay Scheduler push a PR please.
>>
>> [1] https://github.com/zentol/flink/tree/5650_python_test_debug <
>> https://github.com/zentol/flink/tree/5650_python_test_debug>
>>
>>
>> > 在 2017年3月16日，上午3:37，Stephan Ewen <[email protected]> 写道：
>> >
>> > Thanks for the update!
>> >
>> > Just merged to 1.2.1 also: [FLINK-5962] [checkpoints] Remove scheduled
>> > cancel-task from timer queue to prevent memory leaks
>> >
>> > The remaining issue list looks good, but I would say that (5) is
>> optional.
>> > It is not a critical production bug.
>> >
>> >
>> >
>> > On Wed, Mar 15, 2017 at 5:38 PM, Tzu-Li (Gordon) Tai <
>> [email protected]>
>> > wrote:
>> >
>> >> Thanks a lot for the updates so far everyone!
>> >>
>> >> From the discussion so far, the below is the still unfixed pending
>> issues
>> >> for 1.1.5 / 1.2.1 release.
>> >>
>> >> Since there’s only one backport for 1.1.5 left, I think having an RC for
>> >> 1.1.5 near the end of this week / early next week is very promising, as
>> >> basically everything is already in.
>> >> I’d be happy to volunteer to help manage the release for 1.1.5, and
>> >> prepare the RC when it’s ready :)
>> >>
>> >> For 1.2.1, we can leave the pending list here for tracking, and come
>> back
>> >> to update it in the near future.
>> >>
>> >> If there’s anything I missed, please let me know!
>> >>
>> >>
>> >> =========== Still pending for Flink 1.1.5 ===========
>> >>
>> >> (1) https://issues.apache.org/jira/browse/FLINK-5701
>> >> Broken at-least-once Kafka producer.
>> >> Status: backport PR pending - https://github.com/apache/flink/pull/3549
>> .
>> >> Since it is a relatively self-contained change, I expect this to be a
>> fast
>> >> fix.
>> >>
>> >>
>> >>
>> >> =========== Still pending for Flink 1.2.1 ===========
>> >>
>> >> (1) https://issues.apache.org/jira/browse/FLINK-5808
>> >> Fix Missing verification for setParallelism and setMaxParallelism
>> >> Status: PR - https://github.com/apache/flink/pull/3509, review in
>> progress
>> >>
>> >> (2) https://issues.apache.org/jira/browse/FLINK-5713
>> >> Protect against NPE in WindowOperator window cleanup
>> >> Status: PR - https://github.com/apache/flink/pull/3535, review pending
>> >>
>> >> (3) https://issues.apache.org/jira/browse/FLINK-6044
>> >> TypeSerializerSerializationProxy.read() doesn't verify the read buffer
>> >> length
>> >> Status: Fixed for master, 1.2 backport pending
>> >>
>> >> (4) https://issues.apache.org/jira/browse/FLINK-5985
>> >> Flink treats every task as stateful (making topology changes impossible)
>> >> Status: PR - https://github.com/apache/flink/pull/3543, review in
>> progress
>> >>
>> >> (5) https://issues.apache.org/jira/browse/FLINK-5650
>> >> Flink-python tests taking up too much time
>> >> Status: I think Chesnay currently has some progress with this one, we
>> can
>> >> see if we want to make this a blocker
>> >>
>> >>
>> >> Cheers,
>> >> Gordon
>> >>
>> >> On March 15, 2017 at 7:16:53 PM, Jinkui Shi ([email protected])
>> wrote:
>> >>
>> >> Can we fix this issue in the 1.2.1:
>> >>
>> >> Flink-python tests cost too long time
>> >> https://issues.apache.org/jira/browse/FLINK-5650 <
>> >> https://issues.apache.org/jira/browse/FLINK-5650>
>> >>
>> >>> 在 2017年3月15日，下午6:29，Vladislav Pernin <[email protected]> 写道：
>> >>>
>> >>> I just tested in in my reproducer. It works.
>> >>>
>> >>> 2017-03-15 11:22 GMT+01:00 Aljoscha Krettek <[email protected]>:
>> >>>
>> >>>> I did in fact just open a PR for
>> >>>>> https://issues.apache.org/jira/browse/FLINK-6001
>> >>>>> NPE on TumblingEventTimeWindows with ContinuousEventTimeTrigger and
>> >>>>> allowedLateness
>> >>>>
>> >>>>
>> >>>> On Tue, Mar 14, 2017, at 18:20, Vladislav Pernin wrote:
>> >>>>> Hi,
>> >>>>>
>> >>>>> I would also include the following (not yet resolved) issue in the
>> >> 1.2.1
>> >>>>> scope :
>> >>>>>
>> >>>>> https://issues.apache.org/jira/browse/FLINK-6001
>> >>>>> NPE on TumblingEventTimeWindows with ContinuousEventTimeTrigger and
>> >>>>> allowedLateness
>> >>>>>
>> >>>>> 2017-03-14 17:34 GMT+01:00 Ufuk Celebi <[email protected]>:
>> >>>>>
>> >>>>>> Big +1 Gordon!
>> >>>>>>
>> >>>>>> I think (10) is very critical to have in 1.2.1.
>> >>>>>>
>> >>>>>> – Ufuk
>> >>>>>>
>> >>>>>>
>> >>>>>> On Tue, Mar 14, 2017 at 3:37 PM, Stefan Richter
>> >>>>>> <[email protected]> wrote:
>> >>>>>>> Hi,
>> >>>>>>>
>> >>>>>>> I would suggest to also include in 1.2.1:
>> >>>>>>>
>> >>>>>>> (9) https://issues.apache.org/jira/browse/FLINK-6044 <
>> >>>>>> https://issues.apache.org/jira/browse/FLINK-6044>
>> >>>>>>> Replaces unintentional calls to InputStream#read(…) with the
>> intended
>> >>>>>>> and correct InputStream#readFully(…)
>> >>>>>>> Status: PR
>> >>>>>>>
>> >>>>>>> (10) https://issues.apache.org/jira/browse/FLINK-5985 <
>> >>>>>> https://issues.apache.org/jira/browse/FLINK-5985>
>> >>>>>>> Flink 1.2 was creating state handles for stateless tasks which
>> caused
>> >>>>>> trouble
>> >>>>>>> at restore time for users that wanted to do some changes that only
>> >>>>>> include
>> >>>>>>> stateless operators to their topology.
>> >>>>>>> Status: PR
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>> Am 14.03.2017 um 15:15 schrieb Till Rohrmann <
>> [email protected]
>> >>>>> :
>> >>>>>>>>
>> >>>>>>>> Thanks for kicking off the discussion Tzu-Li. I'd like to add the
>> >>>>>> following
>> >>>>>>>> issues which have already been merged into the 1.2-release and
>> >>>>>> 1.1-release
>> >>>>>>>> branch:
>> >>>>>>>>
>> >>>>>>>> 1.2.1:
>> >>>>>>>>
>> >>>>>>>> (7) https://issues.apache.org/jira/browse/FLINK-5942
>> >>>>>>>> Hardens the checkpoint recovery in case of corrupted ZooKeeper
>> data.
>> >>>>>>>> Corrupted checkpoints will now be skipped.
>> >>>>>>>> Status: Merged
>> >>>>>>>>
>> >>>>>>>> (8) https://issues.apache.org/jira/browse/FLINK-5940
>> >>>>>>>> Hardens the checkpoint recovery in case that we cannot retrieve
>> the
>> >>>>>>>> completed checkpoint from the meta data state handle retrieved
>> from
>> >>>>>>>> ZooKeeper. This can, for example, happen if the meta data is
>> >>>> deleted.
>> >>>>>>>> Checkpoints with unretrievable state handles are skipped.
>> >>>>>>>> Status: Merged
>> >>>>>>>>
>> >>>>>>>> 1.1.5:
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> (7) https://issues.apache.org/jira/browse/FLINK-5942
>> >>>>>>>> Hardens the checkpoint recovery in case of corrupted ZooKeeper
>> data.
>> >>>>>>>> Corrupted checkpoints will now be skipped.
>> >>>>>>>> Status: Merged
>> >>>>>>>>
>> >>>>>>>> (8) https://issues.apache.org/jira/browse/FLINK-5940
>> >>>>>>>> Hardens the checkpoint recovery in case that we cannot retrieve
>> the
>> >>>>>>>> completed checkpoint from the meta data state handle retrieved
>> from
>> >>>>>>>> ZooKeeper. This can, for example, happen if the meta data is
>> >>>> deleted.
>> >>>>>>>> Checkpoints with unretrievable state handles are skipped.
>> >>>>>>>> Status: Merged
>> >>>>>>>>
>> >>>>>>>> Cheers,
>> >>>>>>>> Till
>> >>>>>>>>
>> >>>>>>>> On Tue, Mar 14, 2017 at 12:02 PM, Tzu-Li (Gordon) Tai <
>> >>>>>> [email protected]>
>> >>>>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>>> Hi all!
>> >>>>>>>>>
>> >>>>>>>>> I would like to start a discussion for the next bugfix release
>> for
>> >>>>>> 1.1.x
>> >>>>>>>>> and 1.2.x.
>> >>>>>>>>> There’s been quite a few critical fixes for bugs in both the
>> >>>> releases
>> >>>>>>>>> recently, and I think they deserve a bugfix release soon.
>> >>>>>>>>> Most of the bugs were reported by users.
>> >>>>>>>>>
>> >>>>>>>>> I’m starting the discussion for both bugfix releases because most
>> >>>> fixes
>> >>>>>>>>> span both releases (almost identical).
>> >>>>>>>>> Of course, the actual RC votes and RC creation process doesn’t
>> >>>> have to
>> >>>>>> be
>> >>>>>>>>> started together.
>> >>>>>>>>>
>> >>>>>>>>> Here’s an overview of what’s been collected so far, for both
>> bugfix
>> >>>>>>>>> releases -
>> >>>>>>>>> (it’s a list of what I’m aware of so far, and may be missing
>> stuff;
>> >>>>>> please
>> >>>>>>>>> append and bring to attention as necessary :-) )
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> For Flink 1.2.1:
>> >>>>>>>>>
>> >>>>>>>>> (1) https://issues.apache.org/jira/browse/FLINK-5701:
>> >>>>>>>>> Async exceptions in the FlinkKafkaProducer are not checked on
>> >>>>>> checkpoints.
>> >>>>>>>>> This compromises the producer’s at-least-once guarantee.
>> >>>>>>>>> Status: merged
>> >>>>>>>>>
>> >>>>>>>>> (2) https://issues.apache.org/jira/browse/FLINK-5949:
>> >>>>>>>>> Do not check Kerberos credentials for non-Kerberos
>> authentications.
>> >>>>>> MapR
>> >>>>>>>>> users are affected by this, and cannot submit Flink on YARN jobs
>> >>>> on a
>> >>>>>>>>> secured MapR cluster.
>> >>>>>>>>> Status: PR - https://github.com/apache/flink/pull/3528, one +1
>> >>>> already
>> >>>>>>>>>
>> >>>>>>>>> (3) https://issues.apache.org/jira/browse/FLINK-6006:
>> >>>>>>>>> Kafka Consumer can lose state if queried partition list is
>> >>>> incomplete
>> >>>>>> on
>> >>>>>>>>> restore.
>> >>>>>>>>> Status: PR - https://github.com/apache/flink/pull/3505, one +1
>> >>>> already
>> >>>>>>>>>
>> >>>>>>>>> (4) https://issues.apache.org/jira/browse/FLINK-6025:
>> >>>>>>>>> KryoSerializer may use the wrong classloader when Kryo’s
>> >>>>>> JavaSerializer is
>> >>>>>>>>> used.
>> >>>>>>>>> Status: merged
>> >>>>>>>>>
>> >>>>>>>>> (5) https://issues.apache.org/jira/browse/FLINK-5771:
>> >>>>>>>>> Fix multi-char delimiters in Batch InputFormats.
>> >>>>>>>>> Status: merged
>> >>>>>>>>>
>> >>>>>>>>> (6) https://issues.apache.org/jira/browse/FLINK-5934:
>> >>>>>>>>> Set the Scheduler in the ExecutionGraph via its constructor. This
>> >>>>>> fixes a
>> >>>>>>>>> bug that causes HA recovery to fail.
>> >>>>>>>>> Status: merged
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> For Flink 1.1.5:
>> >>>>>>>>>
>> >>>>>>>>> (1) https://issues.apache.org/jira/browse/FLINK-5701:
>> >>>>>>>>> Async exceptions in the FlinkKafkaProducer are not checked on
>> >>>>>> checkpoints.
>> >>>>>>>>> This compromises the producer’s at-least-once guarantee.
>> >>>>>>>>> Status: This is already merged for 1.2.1. I would personally like
>> >>>> to
>> >>>>>>>>> backport the fix for this to 1.1.5 also.
>> >>>>>>>>>
>> >>>>>>>>> (2) https://issues.apache.org/jira/browse/FLINK-6006:
>> >>>>>>>>> Kafka Consumer can lose state if queried partition list is
>> >>>> incomplete
>> >>>>>> on
>> >>>>>>>>> restore.
>> >>>>>>>>> Status: PR - https://github.com/apache/flink/pull/3507, one +1
>> >>>> already
>> >>>>>>>>>
>> >>>>>>>>> (3) https://issues.apache.org/jira/browse/FLINK-6025:
>> >>>>>>>>> KryoSerializer may use the wrong classloader when Kryo’s
>> >>>>>> JavaSerializer is
>> >>>>>>>>> used.
>> >>>>>>>>> Status: merged
>> >>>>>>>>>
>> >>>>>>>>> (4) https://issues.apache.org/jira/browse/FLINK-5771:
>> >>>>>>>>> Fix multi-char delimiters in Batch InputFormats.
>> >>>>>>>>> Status: merged
>> >>>>>>>>>
>> >>>>>>>>> (5) https://issues.apache.org/jira/browse/FLINK-5934:
>> >>>>>>>>> Set the Scheduler in the ExecutionGraph via its constructor. This
>> >>>>>> fixes a
>> >>>>>>>>> bug that causes HA recovery to fail.
>> >>>>>>>>> Status: merged
>> >>>>>>>>>
>> >>>>>>>>> (6) https://issues.apache.org/jira/browse/FLINK-5048:
>> >>>>>>>>> Kafka Consumer (0.9/0.10) threading model leads problematic
>> >>>>>> cancellation
>> >>>>>>>>> behavior.
>> >>>>>>>>> Status: This fix was already released in 1.2.0, but never made it
>> >>>> into
>> >>>>>> the
>> >>>>>>>>> 1.1.x bugfixes. Do we want to backport this also for 1.1.5?
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> What do you think? From the list so far, we pretty much already
>> >>>> have
>> >>>>>>>>> everything in, so I think it would be nice to aim for RCs by the
>> >>>> end of
>> >>>>>>>>> this week.
>> >>>>>>>>> Since both bugfix releases cover almost the same list of issues,
>> I
>> >>>>>> think
>> >>>>>>>>> it shouldn’t be too hard for us to kick off both bugfix releases
>> >>>>>> around the
>> >>>>>>>>> same time.
>> >>>>>>>>>
>> >>>>>>>>> Also FYI, here’s the lists of JIRA tickets tagged with "1.2.1” /
>> >>>>>> “1.1.5”
>> >>>>>>>>> as the Fix Versions, and are still open.
>> >>>>>>>>> We should probably want to check if there’s anything on there
>> that
>> >>>> we
>> >>>>>>>>> should block on for the releases:
>> >>>>>>>>>
>> >>>>>>>>> For 1.2.1:
>> >>>>>>>>> https://issues.apache.org/jira/browse/FLINK-5711?jql=
>> >>>>>>>>> project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%
>> >>>>>>>>> 22In%20Progress%22%2C%20Reopened)%20AND%
>> 20fixVersion%20%3D%201.2.1
>> >>>>>>>>>
>> >>>>>>>>> For 1.1.5:
>> >>>>>>>>> https://issues.apache.org/jira/browse/FLINK-6006?jql=
>> >>>>>>>>> project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20%
>> >>>>>>>>> 22In%20Progress%22%2C%20Reopened)%20AND%
>> 20fixVersion%20%3D%201.1.5
>> >>>>>>>
>> >>>>>>
>> >>>>
>> >>>
>> >>
>> >>
>>
>>

Re: [DISCUSS] Release Flink 1.1.5 / Flink 1.2.1

Reply via email to