There are three levels of support we could land on. 1. Flink works with unreserved resources (revert FLINK-7294). 2. Flink works with unreserved resources, and correctly ignores reserved resources (revert FLINK-7294 and mitigate Fenzo bug). 3. Flink works with unreserved resources and reserved resources.
3 is a moon shot. Striving for 2. Fallback on 1. On Fri, Dec 1, 2017 at 2:10 PM, Aljoscha Krettek <aljos...@apache.org> wrote: > Thanks for the update! > > Just to be clear, you're proposing going forward with the "simple fix" of > reverting FLINK-7294? > > > On 1. Dec 2017, at 18:39, Eron Wright <eronwri...@gmail.com> wrote: > > > > Update on reported Mesos issue (FLINK-8174): > > > > TLDR; a PR will be ready within 24 hours that will undo reservation > support. > > > > A couple of months ago, a fix (FLINK-7294) was merged related to how > Flink > > accepts Mesos resource offers. The intention was to allow Flink to make > > use of so-called +reserved+ resources, a Mesos feature which makes it > > possible to reserve hosts for use by a specific framework/role. The fix > > inadvertently regressed the ability to use +unreserved+ resources. This > is > > a serious regression because unreserved resources are the common case. > > > > The simple solution is to revert the earlier fix, deferring support for > > reservations to another release. We are spending some time to find a > fix > > that works for all scenarios, but seems unlikely at this time. I am > > reaching out to the original contributor to get their feedback. > > > > In the course of the investigation, a related flaw was discovered in > Fenzo > > that causes Flink to misinterpret offers that contain a mix of reserved > and > > unreserved resources. I believe that a small fix is possible purely > > within Flink; an update to Fenzo does not appear necessary. > > > > Going forward, we will contribute an improved integration test suite with > > which to test Flink under diverse Mesos conditions (e.g. reservations). > > > > Thanks, > > Eron > > > > On Thu, Nov 30, 2017 at 9:47 PM, Tzu-Li (Gordon) Tai < > tzuli...@apache.org> > > wrote: > > > >> Hi, > >> > >> I’ve noticed a behavioral regression in the Kafka producer, that should > >> also be considered a blocker: https://issues.apache.org/ > >> jira/browse/FLINK-8181 > >> There’s already a PR for the issue here: https://github.com/ > >> apache/flink/pull/5108 > >> > >> Best, > >> Gordon > >> > >> On 30 November 2017 at 5:27:22 PM, Fabian Hueske (fhue...@gmail.com) > >> wrote: > >> > >> I've created a JIRA issue for the the Hadoop 2.9.0 build problem [1]. > >> > >> Best, Fabian > >> > >> [1] https://issues.apache.org/jira/browse/FLINK-8177 > >> > >> 2017-11-30 4:35 GMT+01:00 Eron Wright <eronwri...@gmail.com>: > >> > >>> Unfortunately we've identified a blocker bug for Flink on Mesos - > >>> FLINK-8174. We'll have a patch ready on Thursday. > >>> > >>> Thanks, > >>> Eron > >>> > >>> On Wed, Nov 29, 2017 at 3:40 PM, Eron Wright <eronwri...@gmail.com> > >> wrote: > >>> > >>>> On Dell EMC side, we're testing the RC2 on DCOS 1.10.0. Seeing a > >>>> potential issue with offer acceptance and we'll update the thread with > >> a > >>> +1 > >>>> or with a more concrete issue within 24 hours. > >>>> > >>>> Thanks, > >>>> Eron > >>>> > >>>> On Wed, Nov 29, 2017 at 6:54 AM, Chesnay Schepler <ches...@apache.org > > > >>>> wrote: > >>>> > >>>>> I don't think anyone has taken a look yet, nor was there a discussion > >> as > >>>>> to postponing it. > >>>>> > >>>>> It just slipped through the cracks i guess... > >>>>> > >>>>> > >>>>> On 29.11.2017 15:47, Gyula Fóra wrote: > >>>>> > >>>>>> Hi guys, > >>>>>> I ran into this again while playing with savepoint/restore > >> parallelism: > >>>>>> > >>>>>> https://issues.apache.org/jira/browse/FLINK-7595 > >>>>>> https://github.com/apache/flink/pull/4651 > >>>>>> > >>>>>> Anyone has some idea about the status of this PR or were we planning > >> to > >>>>>> postpone this to 1.5? > >>>>>> > >>>>>> Thanks, > >>>>>> Gyula > >>>>>> > >>>>>> > >>>>>> Fabian Hueske <fhue...@gmail.com> ezt írta (időpont: 2017. nov. > 29., > >>>>>> Sze, > >>>>>> 13:10): > >>>>>> > >>>>>> OK, the situation is the following: > >>>>>>> > >>>>>>> The test class (org.apache.flink.yarn.UtilsTest) implements a > >> Hadoop > >>>>>>> interface (Container) that was extended in Hadoop 2.9.0 by a getter > >>> and > >>>>>>> setter. > >>>>>>> By adding the methods, we can compile Flink for Hadoop 2.9.0. > >> However, > >>>>>>> the > >>>>>>> getter/setter add a dependency on a class that was also added in > >>> Hadoop > >>>>>>> 2.9.0. > >>>>>>> Therefore, the implementation is not backwards compatible with > >> Hadoop > >>>>>>> versions < 2.9.0. > >>>>>>> > >>>>>>> Not sure how we can fix the problem. We would need two version of > >> the > >>>>>>> class > >>>>>>> that are chosen based on the Hadoop version. Do we have something > >> like > >>>>>>> that > >>>>>>> somewhere else? > >>>>>>> > >>>>>>> Since this is only a problem in a test class, Flink 1.4.0 might > >> still > >>>>>>> work > >>>>>>> very well with Hadoop 2.9.0. > >>>>>>> However, this has not been tested AFAIK. > >>>>>>> > >>>>>>> Cheers, Fabian > >>>>>>> > >>>>>>> 2017-11-29 12:47 GMT+01:00 Fabian Hueske <fhue...@gmail.com>: > >>>>>>> > >>>>>>> I just tried to build the release-1.4 branch for Hadoop 2.9.0 > >>> (released > >>>>>>>> a > >>>>>>>> few days ago) and got a compilation failure in a test class. > >>>>>>>> > >>>>>>>> Right now, I'm assessing how much we need to fix to support Hadoop > >>>>>>>> 2.9.0. > >>>>>>>> I'll report later. > >>>>>>>> > >>>>>>>> Best, Fabian > >>>>>>>> > >>>>>>>> 2017-11-29 11:16 GMT+01:00 Aljoscha Krettek <aljos...@apache.org > >: > >>>>>>>> > >>>>>>>> Agreed, this is a regression compared to the previous > >> functionality. > >>> I > >>>>>>>>> updated the issue to "Blocker". > >>>>>>>>> > >>>>>>>>> On 29. Nov 2017, at 10:01, Gyula Fóra <gyula.f...@gmail.com> > >> wrote: > >>>>>>>>>> > >>>>>>>>>> Hi all, > >>>>>>>>>> > >>>>>>>>>> I have found the following issue: > >>>>>>>>>> https://issues.apache.org/jira/browse/FLINK-8165 > >>>>>>>>>> > >>>>>>>>>> I would say this is a blocker (I personally pass the > >> ParameterTool > >>>>>>>>>> all > >>>>>>>>>> > >>>>>>>>> over > >>>>>>>>> > >>>>>>>>>> the place in my production apps), but a pretty trivial issue to > >>> fix, > >>>>>>>>>> > >>>>>>>>> we > >>>>>>> > >>>>>>>> can > >>>>>>>>> > >>>>>>>>>> wait a little to find other potential problems. > >>>>>>>>>> > >>>>>>>>>> I can submit a fix in a little bit. > >>>>>>>>>> > >>>>>>>>>> Cheers, > >>>>>>>>>> Gyula > >>>>>>>>>> > >>>>>>>>>> Tzu-Li (Gordon) Tai <tzuli...@apache.org> ezt írta (időpont: > >> 2017. > >>>>>>>>>> > >>>>>>>>> nov. > >>>>>>> > >>>>>>>> 29., Sze, 9:23): > >>>>>>>>>> > >>>>>>>>>> +1 > >>>>>>>>>>> > >>>>>>>>>>> Verified: > >>>>>>>>>>> - No missing release Maven artifacts > >>>>>>>>>>> - Staged Apache source & binary convenience releases looks good > >>>>>>>>>>> - NOTICE / LICENSE is correct, README is sane > >>>>>>>>>>> - Built from source (macOS, Scala 2.11, Hadoop-free & Hadoop > >> 2.8) > >>>>>>>>>>> - Cluster testing on AWS EMR (see release-testing-doc for > >>>>>>>>>>> > >>>>>>>>>> configuration > >>>>>>> > >>>>>>>> details) > >>>>>>>>>>> - Tested Kinesis / Elasticsearch connector (no dependency > >> clashes > >>> on > >>>>>>>>>>> cluster execution, works locally in IDE) > >>>>>>>>>>> > >>>>>>>>>>> Thanks a lot for managing the release Aljoscha! > >>>>>>>>>>> > >>>>>>>>>>> Cheers, > >>>>>>>>>>> Gordon > >>>>>>>>>>> > >>>>>>>>>>> On 28 November 2017 at 8:32:42 PM, Stefan Richter ( > >>>>>>>>>>> s.rich...@data-artisans.com) wrote: > >>>>>>>>>>> > >>>>>>>>>>> +1 (non-binding) > >>>>>>>>>>> > >>>>>>>>>>> I tested Flink in a cluster setup on Google Cloud, > YARN-per-job, > >>>>>>>>>>> > >>>>>>>>>> checked > >>>>>>>>> > >>>>>>>>>> that for all backends that HA, recovery, at-least-once, > >> end-to-end > >>>>>>>>>>> > >>>>>>>>>> exactly > >>>>>>>>> > >>>>>>>>>> once (with Kafka11 Producer), savepoints, externalized > >> checkpoints, > >>>>>>>>>>> > >>>>>>>>>> and > >>>>>>> > >>>>>>>> rescaling work correctly. > >>>>>>>>>>> > >>>>>>>>>>> Am 28.11.2017 um 11:47 schrieb Aljoscha Krettek < > >>>>>>>>>>>> > >>>>>>>>>>> aljos...@apache.org > >>>>>>> > >>>>>>>> : > >>>>>>>>>> > >>>>>>>>>>> +1 > >>>>>>>>>>>> > >>>>>>>>>>>> Verified: > >>>>>>>>>>>> - NOTICE and LICENSE are correct > >>>>>>>>>>>> - source doesn't contain binaries > >>>>>>>>>>>> - verified signatures > >>>>>>>>>>>> - verified hashes > >>>>>>>>>>>> - cluster testing on AWS and Cloudera VM (with Kerberos) (see > >>>>>>>>>>>> > >>>>>>>>>>> release-testing doc) > >>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> On 28. Nov 2017, at 11:20, Aljoscha Krettek < > >> aljos...@apache.org > >>>> > >>>>>>>>>>>>> > >>>>>>>>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>>> Phew, thanks for the update! > >>>>>>>>>>>>> > >>>>>>>>>>>>> On 28. Nov 2017, at 11:19, Gyula Fóra <gyf...@apache.org> > >>> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Ok seems like I had to remove the snappy jar as it was > >>> corrupted > >>>>>>>>>>>>>> > >>>>>>>>>>>>> (makes > >>>>>>>>> > >>>>>>>>>> total sense) :P > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Gyula Fóra <gyf...@apache.org> ezt írta (időpont: 2017. > nov. > >>>>>>>>>>>>>> 28., > >>>>>>>>>>>>>> > >>>>>>>>>>>>> K, > >>>>>>>>> > >>>>>>>>>> 11:13): > >>>>>>>>>>> > >>>>>>>>>>>> Hi Aljoscha, > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Thanks for the release candidate. I am having a hard time > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>> building > >>>>>>> > >>>>>>>> the rc, > >>>>>>>>>>> > >>>>>>>>>>>> I seem to get this error no matter what I do: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> [ERROR] Failed to execute goal > >>>>>>>>>>>>>>> org.apache.maven.plugins:maven-shade-plugin:2.4.1:shade > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>> (shade-hadoop) on > >>>>>>>>>>> > >>>>>>>>>>>> project flink-shaded-hadoop2-uber: Error creating shaded jar: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>> invalid > >>>>>>>>> > >>>>>>>>>> LOC > >>>>>>>>>>> > >>>>>>>>>>>> header (bad signature) -> [Help 1] > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> (Apache Maven 3.3.9) > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Any idea what I am missing? > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>>> Gyula > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Aljoscha Krettek <aljos...@apache.org> ezt írta (időpont: > >>> 2017. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>> nov. > >>>>>>>>> > >>>>>>>>>> 27., > >>>>>>>>>>> > >>>>>>>>>>>> H, 19:35): > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Hi everyone, > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Please review and vote on release candidate #2 for the > >>> version > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> 1.4.0, as > >>>>>>>>>>> > >>>>>>>>>>>> follows: > >>>>>>>>>>>>>>>> [ ] +1, Approve the release > >>>>>>>>>>>>>>>> [ ] -1, Do not approve the release (please provide > specific > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> comments) > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>>>>>>> The complete staging area is available for your review, > >> which > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> includes: > >>>>>>>>>>> > >>>>>>>>>>>> * JIRA release notes [1], > >>>>>>>>>>>>>>>> * the official Apache source release and binary > convenience > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> releases > >>>>>>>>> > >>>>>>>>>> to > >>>>>>>>>>> > >>>>>>>>>>>> be deployed to dist.apache.org[2], which are signed with the > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> key > >>>>>>> > >>>>>>>> with > >>>>>>>>>>> > >>>>>>>>>>>> fingerprint F2A67A8047499BBB3908D17AA8F4FD97121D7293 [3], > >>>>>>>>>>>>>>>> * all artifacts to be deployed to the Maven Central > >>> Repository > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> [4], > >>>>>>>>> > >>>>>>>>>> * source code tag "release-1.4.0-rc1" [5], > >>>>>>>>>>>>>>>> * website pull request listing the new release [6]. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Please have a careful look at the website PR because I > >>> changed > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> some > >>>>>>>>> > >>>>>>>>>> wording and we're now also releasing a binary without Hadoop > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> dependencies. > >>>>>>>>>>> > >>>>>>>>>>>> Please use this document for coordinating testing efforts: [7] > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> The only change between RC1 and this RC2 is that the > source > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> release > >>>>>>>>> > >>>>>>>>>> package does not include the erroneously included binary Ruby > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> dependencies > >>>>>>>>>>> > >>>>>>>>>>>> of the documentation anymore. Because of this I would like to > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> propose a > >>>>>>>>>>> > >>>>>>>>>>>> shorter voting time and close the vote around the time that > RC1 > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> would have > >>>>>>>>>>> > >>>>>>>>>>>> closed. This would mean closing by end of Wednesday. Please > let > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> me > >>>>>>> > >>>>>>>> know if > >>>>>>>>>>> > >>>>>>>>>>>> you disagree with this. The vote is adopted by majority > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> approval, > >>>>>>> > >>>>>>>> with at > >>>>>>>>>>> > >>>>>>>>>>>> least 3 PMC affirmative votes. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>>>> Your friendly Release Manager > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> [1] > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> https://issues.apache.org/jira/secure/ReleaseNote.jspa? > >> proje > >>>>>>>>>>> > >>>>>>>>>> ctId=12315522&version=12340533 > >>>>>>>>> > >>>>>>>>>> [2] http://people.apache.org/~aljoscha/flink-1.4.0-rc2/ > >>>>>>>>>>>>>>>> [3] https://dist.apache.org/repos/dist/release/flink/KEYS > >>>>>>>>>>>>>>>> [4] > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> https://repository.apache.org/ > >> content/repositories/orgapache > >>>>>>> flink-1140 > >>>>>>> > >>>>>>>> [5] > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> https://git-wip-us.apache.org/ > >> repos/asf?p=flink.git;a=tag;h= > >>>>>>>>>>> > >>>>>>>>>> ea751b7b23b23446ed3fcdeed564bbe8bf4adf9c > >>>>>>>>> > >>>>>>>>>> [6] https://github.com/apache/flink-web/pull/95 > >>>>>>>>>>>>>>>> [7] > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> https://docs.google.com/document/d/ > >> 1HqYyrNoMSXwo8zBpZj7s39Uz > >>>>>>>>>>> > >>>>>>>>>> UdlFcFO8TRpHNZ_cl44/edit?usp=sharing > >>>>>>>>> > >>>>>>>>>> Pro-tip: you can create a settings.xml file with these contents: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> <settings> > >>>>>>>>>>>>>>>> <activeProfiles> > >>>>>>>>>>>>>>>> <activeProfile>flink-1.4.0</activeProfile> > >>>>>>>>>>>>>>>> </activeProfiles> > >>>>>>>>>>>>>>>> <profiles> > >>>>>>>>>>>>>>>> <profile> > >>>>>>>>>>>>>>>> <id>flink-1.4.0</id> > >>>>>>>>>>>>>>>> <repositories> > >>>>>>>>>>>>>>>> <repository> > >>>>>>>>>>>>>>>> <id>flink-1.4.0</id> > >>>>>>>>>>>>>>>> <url> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> https://repository.apache.org/ > >> content/repositories/orgapache > >>>>>>>>>>> > >>>>>>>>>> flink-1140/ > >>>>>>>>> > >>>>>>>>>> </url> > >>>>>>>>>>>>>>>> </repository> > >>>>>>>>>>>>>>>> <repository> > >>>>>>>>>>>>>>>> <id>archetype</id> > >>>>>>>>>>>>>>>> <url> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> https://repository.apache.org/ > >> content/repositories/orgapache > >>>>>>>>>>> > >>>>>>>>>> flink-1140/ > >>>>>>>>> > >>>>>>>>>> </url> > >>>>>>>>>>>>>>>> </repository> > >>>>>>>>>>>>>>>> </repositories> > >>>>>>>>>>>>>>>> </profile> > >>>>>>>>>>>>>>>> </profiles> > >>>>>>>>>>>>>>>> </settings> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> And reference that in you maven commands via --settings > >>>>>>>>>>>>>>>> path/to/settings.xml. This is useful for creating a > >>> quickstart > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> based > >>>>>>>>> > >>>>>>>>>> on the > >>>>>>>>>>> > >>>>>>>>>>>> staged release and for building against the staged jars. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>> > >>>>> > >>>> > >>> > >> > >