@Stefan: What's the state with the RocksDB fixes? I would be +1 to do this.
On Tue, Apr 4, 2017 at 6:05 PM, Chesnay Schepler <ches...@apache.org> wrote: > Yes, aljoscha already opened one against master: > https://github.com/apache/flink/pull/3670 > > On 04.04.2017 17:57, Ted Yu wrote: >> >> Should the commits be reverted from master branch as well ? >> >> On Tue, Apr 4, 2017 at 4:59 AM, Aljoscha Krettek <aljos...@apache.org> >> wrote: >> >>> The commits around FLINK-5808 have been reverted on release-1.2. >>> >>>> On 4. Apr 2017, at 12:16, Stefan Richter <s.rich...@data-artisans.com> >>> >>> wrote: >>>> >>>> I have created a custom build of RocksDB 4.11.2 that fixes a significant >>> >>> performance problem with append operations. I think this should >>> definitely >>> be part of the 1.2.1 release because this is already blocking some users. >>> What is missing is uploading the jar to maven central and a testing run, >>> e.g. with some misbehaved job that has large state. >>>> >>>> >>>>> Am 04.04.2017 um 11:57 schrieb Robert Metzger <rmetz...@apache.org>: >>>>> >>>>> Thank you for opening a PR for this. >>>>> >>>>> Chesnay, do you need more reviews for the metrics changes / backports? >>>>> >>>>> Are there any other release blockers for 1.2.1, or are we good to go? >>>>> >>>>> On Mon, Apr 3, 2017 at 6:48 PM, Aljoscha Krettek <aljos...@apache.org> >>>>> wrote: >>>>> >>>>>> I created a PR for the revert: https://github.com/apache/ >>> >>> flink/pull/3664 >>>>>>> >>>>>>> On 3. Apr 2017, at 18:32, Stephan Ewen <se...@apache.org> wrote: >>>>>>> >>>>>>> +1 for options (1), but also invest the time to fix it properly for >>> >>> 1.2.2 >>>>>>> >>>>>>> >>>>>>> On Mon, Apr 3, 2017 at 9:10 AM, Kostas Kloudas < >>>>>> >>>>>> k.klou...@data-artisans.com> >>>>>>> >>>>>>> wrote: >>>>>>> >>>>>>>> +1 for 1 >>>>>>>> >>>>>>>>> On Apr 3, 2017, at 5:52 PM, Till Rohrmann <trohrm...@apache.org> >>>>>> >>>>>> wrote: >>>>>>>>> >>>>>>>>> +1 for option 1) >>>>>>>>> >>>>>>>>> On Mon, Apr 3, 2017 at 5:48 PM, Fabian Hueske <fhue...@gmail.com> >>>>>> >>>>>> wrote: >>>>>>>>>> >>>>>>>>>> +1 to option 1) >>>>>>>>>> >>>>>>>>>> 2017-04-03 16:57 GMT+02:00 Ted Yu <yuzhih...@gmail.com>: >>>>>>>>>> >>>>>>>>>>> Looks like #1 is better - 1.2.1 would be at least as stable as >>> >>> 1.2.0 >>>>>>>>>>> >>>>>>>>>>> Cheers >>>>>>>>>>> >>>>>>>>>>> On Mon, Apr 3, 2017 at 7:39 AM, Aljoscha Krettek < >>>>>> >>>>>> aljos...@apache.org> >>>>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Just so we’re all on the same page. ;-) >>>>>>>>>>>> >>>>>>>>>>>> There was https://issues.apache.org/jira/browse/FLINK-5808 which >>>>>> >>>>>> was >>>>>>>> >>>>>>>> a >>>>>>>>>>>> >>>>>>>>>>>> bug that we initially discovered in Flink 1.2 which was/is about >>>>>>>>>> >>>>>>>>>> missing >>>>>>>>>>>> >>>>>>>>>>>> verification for the correctness of the combination of >>> >>> parallelism >>>>>> >>>>>> and >>>>>>>>>>>> >>>>>>>>>>>> max-parallelism. Due to lacking test coverage this introduced >>>>>>>>>>>> two >>>>>> >>>>>> more >>>>>>>>>>> >>>>>>>>>>> bugs: >>>>>>>>>>>> >>>>>>>>>>>> - https://issues.apache.org/jira/browse/FLINK-6188: Some >>>>>>>>>>>> setParallelism() methods can't cope with default parallelism >>>>>>>>>>>> - https://issues.apache.org/jira/browse/FLINK-6209: >>>>>>>>>>>> StreamPlanEnvironment always has a parallelism of 1 >>>>>>>>>>>> >>>>>>>>>>>> IMHO, the options are: >>>>>>>>>>>> 1) revert the changes made for FLINK-5808 on the release-1.2 >>> >>> branch >>>>>>>>>> >>>>>>>>>> and >>>>>>>>>>>> >>>>>>>>>>>> live with the bug still being present >>>>>>>>>>>> 2) put in more work to fix FLINK-5808 which requires fixing some >>>>>>>>>>> >>>>>>>>>>> problems >>>>>>>>>>>> >>>>>>>>>>>> that have existed for a long time with how the parallelism is >>> >>> set in >>>>>>>>>>>> >>>>>>>>>>>> streaming programs >>>>>>>>>>>> >>>>>>>>>>>> Best, >>>>>>>>>>>> Aljoscha >>>>>>>>>>>> >>>>>>>>>>>>> On 31. Mar 2017, at 21:34, Robert Metzger <rmetz...@apache.org> >>>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> I don't know what is best to do, but I think releasing 1.2.1 >>> >>> with >>>>>>>>>>>>> >>>>>>>>>>>>> potentially more bugs than 1.2.0 is not a good option. >>>>>>>>>>>>> I suspect a good workaround for FLINK-6188 >>>>>>>>>>>>> <https://issues.apache.org/jira/browse/FLINK-6188> is setting >>> >>> the >>>>>>>>>>>>> >>>>>>>>>>>>> parallelism manually for operators that can't cope with the >>> >>> default >>>>>>>>>> >>>>>>>>>> -1 >>>>>>>>>>>>> >>>>>>>>>>>>> parallelism. >>>>>>>>>>>>> >>>>>>>>>>>>> On Fri, Mar 31, 2017 at 9:06 PM, Aljoscha Krettek < >>>>>>>>>> >>>>>>>>>> aljos...@apache.org >>>>>>>>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> You mean reverting the changes around FLINK-5808 [1]? This is >>> >>> what >>>>>>>>>>>>>> >>>>>>>>>>>>>> introduced the follow-up FLINK-6188 [2]. >>>>>>>>>>>>>> >>>>>>>>>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-5808 >>>>>>>>>>>>>> [2]https://issues.apache.org/jira/browse/FLINK-6188 >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Mar 31, 2017, at 19:10, Robert Metzger wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I think reverting FLINK-6188 for the 1.2 branch might be a >>> >>> good >>>>>>>>>> >>>>>>>>>> idea. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> FLINK-6188 introduced two new bugs, so undoing the FLINK-6188 >>> >>> fix >>>>>>>>>>> >>>>>>>>>>> will >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> lead >>>>>>>>>>>>>>> only to one known bug in 1.2.1, instead of an uncertain >>> >>> number of >>>>>>>>>>>> >>>>>>>>>>>> issues. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> So 1.2.1 is not going to be worse than 1.2.0 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The fix will hopefully make it into 1.2.2 then. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Any other thoughts on this? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Mar 31, 2017 at 6:46 PM, Fabian Hueske < >>>>>> >>>>>> fhue...@gmail.com> >>>>>>>>>>>>>> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I merged the fix for FLINK-6044 to the release-1.2 and >>>>>> >>>>>> release-1.1 >>>>>>>>>>>>>> >>>>>>>>>>>>>> branch. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> 2017-03-31 15:02 GMT+02:00 Fabian Hueske <fhue...@gmail.com >>>> >>>> : >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> We should also backport the fix for FLINK-6044 to Flink >>> >>> 1.2.1. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I'll take care of that. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 2017-03-30 18:50 GMT+02:00 Aljoscha Krettek < >>>>>> >>>>>> aljos...@apache.org >>>>>>>>>>> >>>>>>>>>>> : >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-6188 turns >>> >>> out to >>>>>>>>>> >>>>>>>>>> be >>>>>>>>>>> >>>>>>>>>>> a >>>>>>>>>>>>>> >>>>>>>>>>>>>> bit >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> more involved, see my comments on the PR: >>>>>>>>>>>>>>>>>> https://github.com/apache/flink/pull/3616. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> As I said there, maybe we should revert the commits >>> >>> regarding >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> parallelism/max-parallelism changes and release and then >>> >>> fix >>>>>> >>>>>> it >>>>>>>>>>>>>> >>>>>>>>>>>>>> later. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Wed, Mar 29, 2017, at 23:08, Aljoscha Krettek wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I commented on FLINK-6214: I think it's working as >>> >>> intended, >>>>>>>>>>>>>> >>>>>>>>>>>>>> although >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> we >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> could fix the javadoc/doc. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Wed, Mar 29, 2017, at 17:35, Timo Walther wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> A user reported that all tumbling and slinding window >>>>>>>>>> >>>>>>>>>> assigners >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> contain >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> a pretty obvious bug about offsets. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-6214 >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I think we should also fix this for 1.2.1. What do you >>>>>> >>>>>> think? >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>>>>>> Timo >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Am 29/03/17 um 11:30 schrieb Robert Metzger: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Hi Haohui, >>>>>>>>>>>>>>>>>>>>> I agree that we should fix the parallelism issue. >>>>>> >>>>>> Otherwise, >>>>>>>>>>>>>> >>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 1.2.1 >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> release would introduce a new bug. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Tue, Mar 28, 2017 at 11:59 PM, Haohui Mai < >>>>>>>>>>>>>> >>>>>>>>>>>>>> ricet...@gmail.com> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> -1 (non-binding) >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> We recently found out that all jobs submitted via UI >>> >>> will >>>>>>>>>>>>>> >>>>>>>>>>>>>> have a >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> parallelism of 1, potentially due to FLINK-5808. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Filed FLINK-6209 to track it. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> ~Haohui >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Mon, Mar 27, 2017 at 2:59 AM Chesnay Schepler < >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> ches...@apache.org> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> If possible I would like to include FLINK-6183 & >>>>>> >>>>>> FLINK-6184 >>>>>>>>>>>>>> >>>>>>>>>>>>>> as >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> well. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> They fix 2 metric-related issues that could arise >>> >>> when a >>>>>>>>>>>>>> >>>>>>>>>>>>>> Task is >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> cancelled very early. (like, right away) >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> FLINK-6183 fixes a memory leak where the >>> >>> TaskMetricGroup >>>>>>>>>> >>>>>>>>>> was >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> never closed >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> FLINK-6184 fixes a NullPointerExceptions in the >>>>>>>>>>>>>>>>>>>>>>> buffer >>>>>>>>>>>>>> >>>>>>>>>>>>>> metrics >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> PR here: https://github.com/apache/flink/pull/3611 >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On 26.03.2017 12:35, Aljoscha Krettek wrote: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> I opened a PR for FLINK-6188: >>>>>> >>>>>> https://github.com/apache/ >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> flink/pull/3616 >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> <https://github.com/apache/flink/pull/3616> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> This improves the previously very sparse test >>> >>> coverage >>>>>> >>>>>> for >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> timestamp/watermark assigners and fixes the bug. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> On 25 Mar 2017, at 10:22, Ufuk Celebi < >>> >>> u...@apache.org> >>>>>>>>>>>>>> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> I agree with Aljoscha. >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> -1 because of FLINK-6188 >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> On Sat, Mar 25, 2017 at 9:38 AM, Aljoscha Krettek < >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> aljos...@apache.org> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> I filed this issue, which was observed by a user: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-6188 >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> I think that’s blocking for 1.2.1. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> On 24 Mar 2017, at 18:57, Ufuk Celebi < >>>>>> >>>>>> u...@apache.org> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> RC1 doesn't contain Stefan's backport for the >>>>>>>>>>>>>> >>>>>>>>>>>>>> Asynchronous >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> snapshots >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> for heap-based keyed state that has been merged. >>>>>> >>>>>> Should >>>>>>>>>>>>>> >>>>>>>>>>>>>> we >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> create >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> RC2 >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> with that fix since the voting period only starts >>> >>> on >>>>>>>>>>>>>> >>>>>>>>>>>>>> Monday? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I think >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> it would only mean rerunning the scripts on your >>>>>> >>>>>> side, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> right? >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> – Ufuk >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Mar 24, 2017 at 3:05 PM, Robert Metzger < >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> rmetz...@apache.org> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Dear Flink community, >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Please vote on releasing the following candidate >>> >>> as >>>>>>>>>>>>>> >>>>>>>>>>>>>> Apache >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Flink >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> version 1.2 >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> .1. >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> The commit to be voted on: >>>>>>>>>>>>>>>>>>>>>>>>>>>> *732e55bd* (* >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> http://git-wip-us.apache.org/repos/asf/flink/commit/ >>>>>>>>>>>>>> >>>>>>>>>>>>>> 732e55bd >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> <http://git-wip-us.apache.org/ >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> repos/asf/flink/commit/732e55b >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> d>*) >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> Branch: >>>>>>>>>>>>>>>>>>>>>>>>>>>> release-1.2.1-rc1 >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> The release artifacts to be voted on can be >>>>>>>>>>>>>>>>>>>>>>>>>>>> found >>>>>> >>>>>> at: >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> *http://people.apache.org/~ >>>>>> >>>>>> rmetzger/flink-1.2.1-rc1/ >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> <http://people.apache.org/~ >>>>>> >>>>>> rmetzger/flink-1.2.1-rc1/ >>>>>>>>>>> >>>>>>>>>>> * >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> The release artifacts are signed with the key >>> >>> with >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> fingerprint >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> D9839159: >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> http://www.apache.org/dist/flink/KEYS >>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>> The staging repository for this release can be >>> >>> found >>>>>>>>>>>>>> >>>>>>>>>>>>>> at: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> https://repository.apache.org/