Thanks a lot for the updates so far everyone! From the discussion so far, the below is the still unfixed pending issues for 1.1.5 / 1.2.1 release.
Since there’s only one backport for 1.1.5 left, I think having an RC for 1.1.5 near the end of this week / early next week is very promising, as basically everything is already in. I’d be happy to volunteer to help manage the release for 1.1.5, and prepare the RC when it’s ready :) For 1.2.1, we can leave the pending list here for tracking, and come back to update it in the near future. If there’s anything I missed, please let me know! =========== Still pending for Flink 1.1.5 =========== (1) https://issues.apache.org/jira/browse/FLINK-5701 Broken at-least-once Kafka producer. Status: backport PR pending - https://github.com/apache/flink/pull/3549. Since it is a relatively self-contained change, I expect this to be a fast fix. =========== Still pending for Flink 1.2.1 =========== (1) https://issues.apache.org/jira/browse/FLINK-5808 Fix Missing verification for setParallelism and setMaxParallelism Status: PR - https://github.com/apache/flink/pull/3509, review in progress (2) https://issues.apache.org/jira/browse/FLINK-5713 Protect against NPE in WindowOperator window cleanup Status: PR - https://github.com/apache/flink/pull/3535, review pending (3) https://issues.apache.org/jira/browse/FLINK-6044 TypeSerializerSerializationProxy.read() doesn't verify the read buffer length Status: Fixed for master, 1.2 backport pending (4) https://issues.apache.org/jira/browse/FLINK-5985 Flink treats every task as stateful (making topology changes impossible) Status: PR - https://github.com/apache/flink/pull/3543, review in progress (5) https://issues.apache.org/jira/browse/FLINK-5650 Flink-python tests taking up too much time Status: I think Chesnay currently has some progress with this one, we can see if we want to make this a blocker Cheers, Gordon On March 15, 2017 at 7:16:53 PM, Jinkui Shi (shijinkui...@163.com) wrote: Can we fix this issue in the 1.2.1: Flink-python tests cost too long time https://issues.apache.org/jira/browse/FLINK-5650 <https://issues.apache.org/jira/browse/FLINK-5650> > 在 2017年3月15日,下午6:29,Vladislav Pernin <vladislav.per...@gmail.com> 写道: > > I just tested in in my reproducer. It works. > > 2017-03-15 11:22 GMT+01:00 Aljoscha Krettek <aljos...@apache.org>: > >> I did in fact just open a PR for >>> https://issues.apache.org/jira/browse/FLINK-6001 >>> NPE on TumblingEventTimeWindows with ContinuousEventTimeTrigger and >>> allowedLateness >> >> >> On Tue, Mar 14, 2017, at 18:20, Vladislav Pernin wrote: >>> Hi, >>> >>> I would also include the following (not yet resolved) issue in the 1.2.1 >>> scope : >>> >>> https://issues.apache.org/jira/browse/FLINK-6001 >>> NPE on TumblingEventTimeWindows with ContinuousEventTimeTrigger and >>> allowedLateness >>> >>> 2017-03-14 17:34 GMT+01:00 Ufuk Celebi <u...@apache.org>: >>> >>>> Big +1 Gordon! >>>> >>>> I think (10) is very critical to have in 1.2.1. >>>> >>>> – Ufuk >>>> >>>> >>>> On Tue, Mar 14, 2017 at 3:37 PM, Stefan Richter >>>> <s.rich...@data-artisans.com> wrote: >>>>> Hi, >>>>> >>>>> I would suggest to also include in 1.2.1: >>>>> >>>>> (9) https://issues.apache.org/jira/browse/FLINK-6044 < >>>> https://issues.apache.org/jira/browse/FLINK-6044> >>>>> Replaces unintentional calls to InputStream#read(…) with the intended >>>>> and correct InputStream#readFully(…) >>>>> Status: PR >>>>> >>>>> (10) https://issues.apache.org/jira/browse/FLINK-5985 < >>>> https://issues.apache.org/jira/browse/FLINK-5985> >>>>> Flink 1.2 was creating state handles for stateless tasks which caused >>>> trouble >>>>> at restore time for users that wanted to do some changes that only >>>> include >>>>> stateless operators to their topology. >>>>> Status: PR >>>>> >>>>> >>>>>> Am 14.03.2017 um 15:15 schrieb Till Rohrmann <trohrm...@apache.org >>> : >>>>>> >>>>>> Thanks for kicking off the discussion Tzu-Li. I'd like to add the >>>> following >>>>>> issues which have already been merged into the 1.2-release and >>>> 1.1-release >>>>>> branch: >>>>>> >>>>>> 1.2.1: >>>>>> >>>>>> (7) https://issues.apache.org/jira/browse/FLINK-5942 >>>>>> Hardens the checkpoint recovery in case of corrupted ZooKeeper data. >>>>>> Corrupted checkpoints will now be skipped. >>>>>> Status: Merged >>>>>> >>>>>> (8) https://issues.apache.org/jira/browse/FLINK-5940 >>>>>> Hardens the checkpoint recovery in case that we cannot retrieve the >>>>>> completed checkpoint from the meta data state handle retrieved from >>>>>> ZooKeeper. This can, for example, happen if the meta data is >> deleted. >>>>>> Checkpoints with unretrievable state handles are skipped. >>>>>> Status: Merged >>>>>> >>>>>> 1.1.5: >>>>>> >>>>>> >>>>>> (7) https://issues.apache.org/jira/browse/FLINK-5942 >>>>>> Hardens the checkpoint recovery in case of corrupted ZooKeeper data. >>>>>> Corrupted checkpoints will now be skipped. >>>>>> Status: Merged >>>>>> >>>>>> (8) https://issues.apache.org/jira/browse/FLINK-5940 >>>>>> Hardens the checkpoint recovery in case that we cannot retrieve the >>>>>> completed checkpoint from the meta data state handle retrieved from >>>>>> ZooKeeper. This can, for example, happen if the meta data is >> deleted. >>>>>> Checkpoints with unretrievable state handles are skipped. >>>>>> Status: Merged >>>>>> >>>>>> Cheers, >>>>>> Till >>>>>> >>>>>> On Tue, Mar 14, 2017 at 12:02 PM, Tzu-Li (Gordon) Tai < >>>> tzuli...@apache.org> >>>>>> wrote: >>>>>> >>>>>>> Hi all! >>>>>>> >>>>>>> I would like to start a discussion for the next bugfix release for >>>> 1.1.x >>>>>>> and 1.2.x. >>>>>>> There’s been quite a few critical fixes for bugs in both the >> releases >>>>>>> recently, and I think they deserve a bugfix release soon. >>>>>>> Most of the bugs were reported by users. >>>>>>> >>>>>>> I’m starting the discussion for both bugfix releases because most >> fixes >>>>>>> span both releases (almost identical). >>>>>>> Of course, the actual RC votes and RC creation process doesn’t >> have to >>>> be >>>>>>> started together. >>>>>>> >>>>>>> Here’s an overview of what’s been collected so far, for both bugfix >>>>>>> releases - >>>>>>> (it’s a list of what I’m aware of so far, and may be missing stuff; >>>> please >>>>>>> append and bring to attention as necessary :-) ) >>>>>>> >>>>>>> >>>>>>> For Flink 1.2.1: >>>>>>> >>>>>>> (1) https://issues.apache.org/jira/browse/FLINK-5701: >>>>>>> Async exceptions in the FlinkKafkaProducer are not checked on >>>> checkpoints. >>>>>>> This compromises the producer’s at-least-once guarantee. >>>>>>> Status: merged >>>>>>> >>>>>>> (2) https://issues.apache.org/jira/browse/FLINK-5949: >>>>>>> Do not check Kerberos credentials for non-Kerberos authentications. >>>> MapR >>>>>>> users are affected by this, and cannot submit Flink on YARN jobs >> on a >>>>>>> secured MapR cluster. >>>>>>> Status: PR - https://github.com/apache/flink/pull/3528, one +1 >> already >>>>>>> >>>>>>> (3) https://issues.apache.org/jira/browse/FLINK-6006: >>>>>>> Kafka Consumer can lose state if queried partition list is >> incomplete >>>> on >>>>>>> restore. >>>>>>> Status: PR - https://github.com/apache/flink/pull/3505, one +1 >> already >>>>>>> >>>>>>> (4) https://issues.apache.org/jira/browse/FLINK-6025: >>>>>>> KryoSerializer may use the wrong classloader when Kryo’s >>>> JavaSerializer is >>>>>>> used. >>>>>>> Status: merged >>>>>>> >>>>>>> (5) https://issues.apache.org/jira/browse/FLINK-5771: >>>>>>> Fix multi-char delimiters in Batch InputFormats. >>>>>>> Status: merged >>>>>>> >>>>>>> (6) https://issues.apache.org/jira/browse/FLINK-5934: >>>>>>> Set the Scheduler in the ExecutionGraph via its constructor. This >>>> fixes a >>>>>>> bug that causes HA recovery to fail. >>>>>>> Status: merged >>>>>>> >>>>>>> >>>>>>> >>>>>>> For Flink 1.1.5: >>>>>>> >>>>>>> (1) https://issues.apache.org/jira/browse/FLINK-5701: >>>>>>> Async exceptions in the FlinkKafkaProducer are not checked on >>>> checkpoints. >>>>>>> This compromises the producer’s at-least-once guarantee. >>>>>>> Status: This is already merged for 1.2.1. I would personally like >> to >>>>>>> backport the fix for this to 1.1.5 also. >>>>>>> >>>>>>> (2) https://issues.apache.org/jira/browse/FLINK-6006: >>>>>>> Kafka Consumer can lose state if queried partition list is >> incomplete >>>> on >>>>>>> restore. >>>>>>> Status: PR - https://github.com/apache/flink/pull/3507, one +1 >> already >>>>>>> >>>>>>> (3) https://issues.apache.org/jira/browse/FLINK-6025: >>>>>>> KryoSerializer may use the wrong classloader when Kryo’s >>>> JavaSerializer is >>>>>>> used. >>>>>>> Status: merged >>>>>>> >>>>>>> (4) https://issues.apache.org/jira/browse/FLINK-5771: >>>>>>> Fix multi-char delimiters in Batch InputFormats. >>>>>>> Status: merged >>>>>>> >>>>>>> (5) https://issues.apache.org/jira/browse/FLINK-5934: >>>>>>> Set the Scheduler in the ExecutionGraph via its constructor. This >>>> fixes a >>>>>>> bug that causes HA recovery to fail. >>>>>>> Status: merged >>>>>>> >>>>>>> (6) https://issues.apache.org/jira/browse/FLINK-5048: >>>>>>> Kafka Consumer (0.9/0.10) threading model leads problematic >>>> cancellation >>>>>>> behavior. >>>>>>> Status: This fix was already released in 1.2.0, but never made it >> into >>>> the >>>>>>> 1.1.x bugfixes. Do we want to backport this also for 1.1.5? >>>>>>> >>>>>>> >>>>>>> What do you think? From the list so far, we pretty much already >> have >>>>>>> everything in, so I think it would be nice to aim for RCs by the >> end of >>>>>>> this week. >>>>>>> Since both bugfix releases cover almost the same list of issues, I >>>> think >>>>>>> it shouldn’t be too hard for us to kick off both bugfix releases >>>> around the >>>>>>> same time. >>>>>>> >>>>>>> Also FYI, here’s the lists of JIRA tickets tagged with "1.2.1” / >>>> “1.1.5” >>>>>>> as the Fix Versions, and are still open. >>>>>>> We should probably want to check if there’s anything on there that >> we >>>>>>> should block on for the releases: >>>>>>> >>>>>>> For 1.2.1: >>>>>>> https://issues.apache.org/jira/browse/FLINK-5711?jql= >>>>>>> project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20% >>>>>>> 22In%20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%201.2.1 >>>>>>> >>>>>>> For 1.1.5: >>>>>>> https://issues.apache.org/jira/browse/FLINK-6006?jql= >>>>>>> project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20% >>>>>>> 22In%20Progress%22%2C%20Reopened)%20AND%20fixVersion%20%3D%201.1.5 >>>>> >>>> >> >