Cool! Thanks for taking care of this Gordon :-)
On Fri, Mar 17, 2017 at 7:13 AM, Tzu-Li (Gordon) Tai <tzuli...@apache.org> wrote: > Update for 1.1.5: > The last fixes for 1.1.5 are in! I will create the RC today and start the > vote. > > Cheers, > Gordon > > > On March 17, 2017 at 1:14:53 AM, Robert Metzger (rmetz...@apache.org) wrote: > > The cassandra connector is probably not usable in Flink 1.2.0. I would like > to include a fix in 1.2.1: > https://issues.apache.org/jira/browse/FLINK-6084 > > Please let me know if this fix becomes a blocker for the 1.2.1 release. If > so, I can validate the fix myself to speed up things. > > On Thu, Mar 16, 2017 at 9:41 AM, Jinkui Shi <shijinkui...@163.com> wrote: > >> @Tzu-li(Fordon)Tai >> >> FLINK-5650 is fix by [1]. Chesnay Scheduler push a PR please. >> >> [1] https://github.com/zentol/flink/tree/5650_python_test_debug < >> https://github.com/zentol/flink/tree/5650_python_test_debug> >> >> >> > 在 2017年3月16日,上午3:37,Stephan Ewen <se...@apache.org> 写道: >> > >> > Thanks for the update! >> > >> > Just merged to 1.2.1 also: [FLINK-5962] [checkpoints] Remove scheduled >> > cancel-task from timer queue to prevent memory leaks >> > >> > The remaining issue list looks good, but I would say that (5) is >> optional. >> > It is not a critical production bug. >> > >> > >> > >> > On Wed, Mar 15, 2017 at 5:38 PM, Tzu-Li (Gordon) Tai < >> tzuli...@apache.org> >> > wrote: >> > >> >> Thanks a lot for the updates so far everyone! >> >> >> >> From the discussion so far, the below is the still unfixed pending >> issues >> >> for 1.1.5 / 1.2.1 release. >> >> >> >> Since there’s only one backport for 1.1.5 left, I think having an RC for >> >> 1.1.5 near the end of this week / early next week is very promising, as >> >> basically everything is already in. >> >> I’d be happy to volunteer to help manage the release for 1.1.5, and >> >> prepare the RC when it’s ready :) >> >> >> >> For 1.2.1, we can leave the pending list here for tracking, and come >> back >> >> to update it in the near future. >> >> >> >> If there’s anything I missed, please let me know! >> >> >> >> >> >> =========== Still pending for Flink 1.1.5 =========== >> >> >> >> (1) https://issues.apache.org/jira/browse/FLINK-5701 >> >> Broken at-least-once Kafka producer. >> >> Status: backport PR pending - https://github.com/apache/flink/pull/3549 >> . >> >> Since it is a relatively self-contained change, I expect this to be a >> fast >> >> fix. >> >> >> >> >> >> >> >> =========== Still pending for Flink 1.2.1 =========== >> >> >> >> (1) https://issues.apache.org/jira/browse/FLINK-5808 >> >> Fix Missing verification for setParallelism and setMaxParallelism >> >> Status: PR - https://github.com/apache/flink/pull/3509, review in >> progress >> >> >> >> (2) https://issues.apache.org/jira/browse/FLINK-5713 >> >> Protect against NPE in WindowOperator window cleanup >> >> Status: PR - https://github.com/apache/flink/pull/3535, review pending >> >> >> >> (3) https://issues.apache.org/jira/browse/FLINK-6044 >> >> TypeSerializerSerializationProxy.read() doesn't verify the read buffer >> >> length >> >> Status: Fixed for master, 1.2 backport pending >> >> >> >> (4) https://issues.apache.org/jira/browse/FLINK-5985 >> >> Flink treats every task as stateful (making topology changes impossible) >> >> Status: PR - https://github.com/apache/flink/pull/3543, review in >> progress >> >> >> >> (5) https://issues.apache.org/jira/browse/FLINK-5650 >> >> Flink-python tests taking up too much time >> >> Status: I think Chesnay currently has some progress with this one, we >> can >> >> see if we want to make this a blocker >> >> >> >> >> >> Cheers, >> >> Gordon >> >> >> >> On March 15, 2017 at 7:16:53 PM, Jinkui Shi (shijinkui...@163.com) >> wrote: >> >> >> >> Can we fix this issue in the 1.2.1: >> >> >> >> Flink-python tests cost too long time >> >> https://issues.apache.org/jira/browse/FLINK-5650 < >> >> https://issues.apache.org/jira/browse/FLINK-5650> >> >> >> >>> 在 2017年3月15日,下午6:29,Vladislav Pernin <vladislav.per...@gmail.com> 写道: >> >>> >> >>> I just tested in in my reproducer. It works. >> >>> >> >>> 2017-03-15 11:22 GMT+01:00 Aljoscha Krettek <aljos...@apache.org>: >> >>> >> >>>> I did in fact just open a PR for >> >>>>> https://issues.apache.org/jira/browse/FLINK-6001 >> >>>>> NPE on TumblingEventTimeWindows with ContinuousEventTimeTrigger and >> >>>>> allowedLateness >> >>>> >> >>>> >> >>>> On Tue, Mar 14, 2017, at 18:20, Vladislav Pernin wrote: >> >>>>> Hi, >> >>>>> >> >>>>> I would also include the following (not yet resolved) issue in the >> >> 1.2.1 >> >>>>> scope : >> >>>>> >> >>>>> https://issues.apache.org/jira/browse/FLINK-6001 >> >>>>> NPE on TumblingEventTimeWindows with ContinuousEventTimeTrigger and >> >>>>> allowedLateness >> >>>>> >> >>>>> 2017-03-14 17:34 GMT+01:00 Ufuk Celebi <u...@apache.org>: >> >>>>> >> >>>>>> Big +1 Gordon! >> >>>>>> >> >>>>>> I think (10) is very critical to have in 1.2.1. >> >>>>>> >> >>>>>> – Ufuk >> >>>>>> >> >>>>>> >> >>>>>> On Tue, Mar 14, 2017 at 3:37 PM, Stefan Richter >> >>>>>> <s.rich...@data-artisans.com> wrote: >> >>>>>>> Hi, >> >>>>>>> >> >>>>>>> I would suggest to also include in 1.2.1: >> >>>>>>> >> >>>>>>> (9) https://issues.apache.org/jira/browse/FLINK-6044 < >> >>>>>> https://issues.apache.org/jira/browse/FLINK-6044> >> >>>>>>> Replaces unintentional calls to InputStream#read(…) with the >> intended >> >>>>>>> and correct InputStream#readFully(…) >> >>>>>>> Status: PR >> >>>>>>> >> >>>>>>> (10) https://issues.apache.org/jira/browse/FLINK-5985 < >> >>>>>> https://issues.apache.org/jira/browse/FLINK-5985> >> >>>>>>> Flink 1.2 was creating state handles for stateless tasks which >> caused >> >>>>>> trouble >> >>>>>>> at restore time for users that wanted to do some changes that only >> >>>>>> include >> >>>>>>> stateless operators to their topology. >> >>>>>>> Status: PR >> >>>>>>> >> >>>>>>> >> >>>>>>>> Am 14.03.2017 um 15:15 schrieb Till Rohrmann < >> trohrm...@apache.org >> >>>>> : >> >>>>>>>> >> >>>>>>>> Thanks for kicking off the discussion Tzu-Li. I'd like to add the >> >>>>>> following >> >>>>>>>> issues which have already been merged into the 1.2-release and >> >>>>>> 1.1-release >> >>>>>>>> branch: >> >>>>>>>> >> >>>>>>>> 1.2.1: >> >>>>>>>> >> >>>>>>>> (7) https://issues.apache.org/jira/browse/FLINK-5942 >> >>>>>>>> Hardens the checkpoint recovery in case of corrupted ZooKeeper >> data. >> >>>>>>>> Corrupted checkpoints will now be skipped. >> >>>>>>>> Status: Merged >> >>>>>>>> >> >>>>>>>> (8) https://issues.apache.org/jira/browse/FLINK-5940 >> >>>>>>>> Hardens the checkpoint recovery in case that we cannot retrieve >> the >> >>>>>>>> completed checkpoint from the meta data state handle retrieved >> from >> >>>>>>>> ZooKeeper. This can, for example, happen if the meta data is >> >>>> deleted. >> >>>>>>>> Checkpoints with unretrievable state handles are skipped. >> >>>>>>>> Status: Merged >> >>>>>>>> >> >>>>>>>> 1.1.5: >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> (7) https://issues.apache.org/jira/browse/FLINK-5942 >> >>>>>>>> Hardens the checkpoint recovery in case of corrupted ZooKeeper >> data. >> >>>>>>>> Corrupted checkpoints will now be skipped. >> >>>>>>>> Status: Merged >> >>>>>>>> >> >>>>>>>> (8) https://issues.apache.org/jira/browse/FLINK-5940 >> >>>>>>>> Hardens the checkpoint recovery in case that we cannot retrieve >> the >> >>>>>>>> completed checkpoint from the meta data state handle retrieved >> from >> >>>>>>>> ZooKeeper. This can, for example, happen if the meta data is >> >>>> deleted. >> >>>>>>>> Checkpoints with unretrievable state handles are skipped. >> >>>>>>>> Status: Merged >> >>>>>>>> >> >>>>>>>> Cheers, >> >>>>>>>> Till >> >>>>>>>> >> >>>>>>>> On Tue, Mar 14, 2017 at 12:02 PM, Tzu-Li (Gordon) Tai < >> >>>>>> tzuli...@apache.org> >> >>>>>>>> wrote: >> >>>>>>>> >> >>>>>>>>> Hi all! >> >>>>>>>>> >> >>>>>>>>> I would like to start a discussion for the next bugfix release >> for >> >>>>>> 1.1.x >> >>>>>>>>> and 1.2.x. >> >>>>>>>>> There’s been quite a few critical fixes for bugs in both the >> >>>> releases >> >>>>>>>>> recently, and I think they deserve a bugfix release soon. >> >>>>>>>>> Most of the bugs were reported by users. >> >>>>>>>>> >> >>>>>>>>> I’m starting the discussion for both bugfix releases because most >> >>>> fixes >> >>>>>>>>> span both releases (almost identical). >> >>>>>>>>> Of course, the actual RC votes and RC creation process doesn’t >> >>>> have to >> >>>>>> be >> >>>>>>>>> started together. >> >>>>>>>>> >> >>>>>>>>> Here’s an overview of what’s been collected so far, for both >> bugfix >> >>>>>>>>> releases - >> >>>>>>>>> (it’s a list of what I’m aware of so far, and may be missing >> stuff; >> >>>>>> please >> >>>>>>>>> append and bring to attention as necessary :-) ) >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> For Flink 1.2.1: >> >>>>>>>>> >> >>>>>>>>> (1) https://issues.apache.org/jira/browse/FLINK-5701: >> >>>>>>>>> Async exceptions in the FlinkKafkaProducer are not checked on >> >>>>>> checkpoints. >> >>>>>>>>> This compromises the producer’s at-least-once guarantee. >> >>>>>>>>> Status: merged >> >>>>>>>>> >> >>>>>>>>> (2) https://issues.apache.org/jira/browse/FLINK-5949: >> >>>>>>>>> Do not check Kerberos credentials for non-Kerberos >> authentications. >> >>>>>> MapR >> >>>>>>>>> users are affected by this, and cannot submit Flink on YARN jobs >> >>>> on a >> >>>>>>>>> secured MapR cluster. >> >>>>>>>>> Status: PR - https://github.com/apache/flink/pull/3528, one +1 >> >>>> already >> >>>>>>>>> >> >>>>>>>>> (3) https://issues.apache.org/jira/browse/FLINK-6006: >> >>>>>>>>> Kafka Consumer can lose state if queried partition list is >> >>>> incomplete >> >>>>>> on >> >>>>>>>>> restore. >> >>>>>>>>> Status: PR - https://github.com/apache/flink/pull/3505, one +1 >> >>>> already >> >>>>>>>>> >> >>>>>>>>> (4) https://issues.apache.org/jira/browse/FLINK-6025: >> >>>>>>>>> KryoSerializer may use the wrong classloader when Kryo’s >> >>>>>> JavaSerializer is >> >>>>>>>>> used. >> >>>>>>>>> Status: merged >> >>>>>>>>> >> >>>>>>>>> (5) https://issues.apache.org/jira/browse/FLINK-5771: >> >>>>>>>>> Fix multi-char delimiters in Batch InputFormats. >> >>>>>>>>> Status: merged >> >>>>>>>>> >> >>>>>>>>> (6) https://issues.apache.org/jira/browse/FLINK-5934: >> >>>>>>>>> Set the Scheduler in the ExecutionGraph via its constructor. This >> >>>>>> fixes a >> >>>>>>>>> bug that causes HA recovery to fail. >> >>>>>>>>> Status: merged >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> For Flink 1.1.5: >> >>>>>>>>> >> >>>>>>>>> (1) https://issues.apache.org/jira/browse/FLINK-5701: >> >>>>>>>>> Async exceptions in the FlinkKafkaProducer are not checked on >> >>>>>> checkpoints. >> >>>>>>>>> This compromises the producer’s at-least-once guarantee. >> >>>>>>>>> Status: This is already merged for 1.2.1. I would personally like >> >>>> to >> >>>>>>>>> backport the fix for this to 1.1.5 also. >> >>>>>>>>> >> >>>>>>>>> (2) https://issues.apache.org/jira/browse/FLINK-6006: >> >>>>>>>>> Kafka Consumer can lose state if queried partition list is >> >>>> incomplete >> >>>>>> on >> >>>>>>>>> restore. >> >>>>>>>>> Status: PR - https://github.com/apache/flink/pull/3507, one +1 >> >>>> already >> >>>>>>>>> >> >>>>>>>>> (3) https://issues.apache.org/jira/browse/FLINK-6025: >> >>>>>>>>> KryoSerializer may use the wrong classloader when Kryo’s >> >>>>>> JavaSerializer is >> >>>>>>>>> used. >> >>>>>>>>> Status: merged >> >>>>>>>>> >> >>>>>>>>> (4) https://issues.apache.org/jira/browse/FLINK-5771: >> >>>>>>>>> Fix multi-char delimiters in Batch InputFormats. >> >>>>>>>>> Status: merged >> >>>>>>>>> >> >>>>>>>>> (5) https://issues.apache.org/jira/browse/FLINK-5934: >> >>>>>>>>> Set the Scheduler in the ExecutionGraph via its constructor. This >> >>>>>> fixes a >> >>>>>>>>> bug that causes HA recovery to fail. >> >>>>>>>>> Status: merged >> >>>>>>>>> >> >>>>>>>>> (6) https://issues.apache.org/jira/browse/FLINK-5048: >> >>>>>>>>> Kafka Consumer (0.9/0.10) threading model leads problematic >> >>>>>> cancellation >> >>>>>>>>> behavior. >> >>>>>>>>> Status: This fix was already released in 1.2.0, but never made it >> >>>> into >> >>>>>> the >> >>>>>>>>> 1.1.x bugfixes. Do we want to backport this also for 1.1.5? >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> What do you think? From the list so far, we pretty much already >> >>>> have >> >>>>>>>>> everything in, so I think it would be nice to aim for RCs by the >> >>>> end of >> >>>>>>>>> this week. >> >>>>>>>>> Since both bugfix releases cover almost the same list of issues, >> I >> >>>>>> think >> >>>>>>>>> it shouldn’t be too hard for us to kick off both bugfix releases >> >>>>>> around the >> >>>>>>>>> same time. >> >>>>>>>>> >> >>>>>>>>> Also FYI, here’s the lists of JIRA tickets tagged with "1.2.1” / >> >>>>>> “1.1.5” >> >>>>>>>>> as the Fix Versions, and are still open. >> >>>>>>>>> We should probably want to check if there’s anything on there >> that >> >>>> we >> >>>>>>>>> should block on for the releases: >> >>>>>>>>> >> >>>>>>>>> For 1.2.1: >> >>>>>>>>> https://issues.apache.org/jira/browse/FLINK-5711?jql= >> >>>>>>>>> project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20% >> >>>>>>>>> 22In%20Progress%22%2C%20Reopened)%20AND% >> 20fixVersion%20%3D%201.2.1 >> >>>>>>>>> >> >>>>>>>>> For 1.1.5: >> >>>>>>>>> https://issues.apache.org/jira/browse/FLINK-6006?jql= >> >>>>>>>>> project%20%3D%20FLINK%20AND%20status%20in%20(Open%2C%20% >> >>>>>>>>> 22In%20Progress%22%2C%20Reopened)%20AND% >> 20fixVersion%20%3D%201.1.5 >> >>>>>>> >> >>>>>> >> >>>> >> >>> >> >> >> >> >> >>