I think definitely open a cherry pick PR to a 2.8.x branch. I think we must not corrupt maven central, so if it is published to users this has to be 2.8.1. Ahmet - we are to this point, right?
Kenn On Mon, Oct 29, 2018 at 8:40 AM Ismaël Mejía <[email protected]> wrote: > First thanks Etienne and Kenn for noting the performance issue. I > reviewed the discussed PR.It introduced a new ‘@Experimental’ option > to the Spark runner to change the default source partitioning and > enable users to control it via a predefined size (a prerrequisite for > Spark’s dynamicAllocation). > > This however must not be the default behavior, it seems after looking > at the PR that things are not as expected and the default is now the > new behavior. I will provide a PR to fix this quickly. However the > question is, should I do cherry pick it and we do a new RC (since the > release was already 'passed') ? > On Mon, Oct 29, 2018 at 2:51 PM Kenneth Knowles <[email protected]> wrote: > > > > I didn't isolate it to a cause and commit, so that is extremely useful > to know. To bring some details on thread: > > > > query 4: a single aggregation in sliding windows > > query 8: a single join with no other interesting logic > > query 9 (prefix of query 6*): find the winning bid for each auction > > query 6: query 9 followed by a single aggregation > > > > Kenn > > > > * they seem out of order because the original queries were 1-8 and we > added 9 later to benchmark the baseline without the aggregation > > > > On Mon, Oct 29, 2018 at 3:28 AM Etienne Chauchot <[email protected]> > wrote: > >> > >> Oops, just saw than Kenn already mentioned spark perf degradation on > spark runner around 10/05. Sorry for the repetition. > >> Nevertheless, IMHO, I think it will be still worth checking PR #6181. > >> > >> Etienne > >> > >> Le lundi 29 octobre 2018 à 10:42 +0100, Etienne Chauchot a écrit : > >> > >> Hey, > >> I would vote -0 : here is the explanation: > >> > >> I took a look at Nexmark dashboards for output size and performance for > all the runners in all the modes around the date of the release cut to > search for regressions. > >> > >> I noted a regression on the performance of the spark runner. Query4, > Query6, Query8 and Query9 running times were multiplied by 2 to 3 around > the date of 10/05/18. See > https://apache-beam-testing.appspot.com/explore?dashboard=5138380291571712 > >> So I searched in the commit history of the spark runner module for what > happened around 10/05/18. And I found this commit > >> > >> e4a1ccbaa10808d88c6ad2a687fe9f6d52392d90: Merge pull request #6181: > [BEAM-4783] Add bundleSize for splitting BoundedSources > >> > >> I don't know if it should be considered a blocker but we should > definitely take another look at pull request #6181 that seems to change the > way we split on spark runner. > >> > >> Best > >> Etienne > >> > >> > >> Le vendredi 26 octobre 2018 à 18:20 +0200, Maximilian Michels a écrit : > >> > >> +1 (binding) > >> > >> > >> On 26.10.18 17:45, Kenneth Knowles wrote: > >> > >> Nice. Thanks. > >> > >> > >> +1 > >> > >> > >> > >> On Fri, Oct 26, 2018 at 8:44 AM Robert Bradshaw <[email protected] > >> > >> <mailto:[email protected]>> wrote: > >> > >> > >> Thanks Tim! > >> > >> > >> This was my only hesitation, and sounds like we're in the clear > here. > >> > >> > >> +1 (binding) > >> > >> On Fri, Oct 26, 2018 at 5:05 PM Tim Robertson > >> > >> <[email protected] <mailto:[email protected]>> > wrote: > >> > >> > > >> > >> > A colleague and I tested on 2.7.0 and 2.8.0RC1: > >> > >> > > >> > >> > 1. Quickstart on Spark/YARN/HDFS (CDH 5.12.0) (commented in > >> > >> spreadsheet) > >> > >> > 2. Our Avro to Avro pipelines on Spark/YARN/HDFS (note we > >> > >> backport the un-merged BEAM-5036 fix in our code) > >> > >> > 3. Our Avro to Elasticsearch pipelines on Spark/YARN/HDFS > >> > >> > > >> > >> > Everything worked, and performance was similar on both. > >> > >> > We built using maven pointing at > >> > >> > https://repository.apache.org/content/repositories/orgapachebeam-1049/ > >> > >> > > >> > >> > Based on this limited testing: +1 > >> > >> > > >> > >> > Thank you to the release managers, > >> > >> > Tim > >> > >> > > >> > >> > > >> > >> > On Thu, Oct 25, 2018 at 7:21 PM Tim <[email protected] > >> > >> <mailto:[email protected]>> wrote: > >> > >> >> > >> > >> >> I can do some tests on Spark / YARN tomorrow (CEST timezone). > >> > >> Sorry I’ve just been too busy to assist. > >> > >> >> > >> > >> >> Tim > >> > >> >> > >> > >> >> On 25 Oct 2018, at 18:59, Kenneth Knowles <[email protected] > >> > >> <mailto:[email protected]>> wrote: > >> > >> >> > >> > >> >> I tried to do a more thorough job on this. > >> > >> >> > >> > >> >> - I could not reproduce the slowdown in Query 9. I believe the > >> > >> variance was simply high given the parameters and environment > >> > >> >> - I saw the same slowdown in Query 8 when running as part of > >> > >> the suite, but it vanished when I ran repeatedly on its own, so > >> > >> again it is not good methodology probably > >> > >> >> > >> > >> >> We do have the dashboard at > >> > >> https://apache-beam-testing.appspot.com/dashboard-admin though no > >> > >> anomaly detection set up AFAIK. > >> > >> >> > >> > >> >> - There is no issue easily visible in DirectRunner: > >> > >> > https://apache-beam-testing.appspot.com/explore?dashboard=5084698770407424 > >> > >> >> - There is a notable degradation in Spark runner on 10/5 for > >> > >> many queries. > >> > >> > https://apache-beam-testing.appspot.com/explore?dashboard=5138380291571712 > >> > >> >> - Something minor happened for Dataflow around 10/1: > >> > >> > https://apache-beam-testing.appspot.com/explore?dashboard=5670405876482048 > >> > >> >> - Flink runner seems to have had some fantastic improvements > >> > >> :-) > >> > >> > https://apache-beam-testing.appspot.com/explore?dashboard=5699257587728384 > >> > >> >> > >> > >> >> So if there is a blocker it would really be the Spark runner > >> > >> perf changes. Of course, all these except Dataflow are using local > >> > >> instances so may not be representative of larger scale AFAIK. > >> > >> >> > >> > >> >> Kenn > >> > >> >> > >> > >> >> On Wed, Oct 24, 2018 at 9:48 AM Maximilian Michels > >> > >> <[email protected] <mailto:[email protected]>> wrote: > >> > >> >>> > >> > >> >>> I've run WordCount using Quickstart with the FlinkRunner > >> > >> (locally and > >> > >> >>> against a Flink cluster). > >> > >> >>> > >> > >> >>> Would give a +1 but waiting what Kenn finds. > >> > >> >>> > >> > >> >>> -Max > >> > >> >>> > >> > >> >>> On 23.10.18 07:11, Ahmet Altay wrote: > >> > >> >>> > > >> > >> >>> > > >> > >> >>> > On Mon, Oct 22, 2018 at 10:06 PM, Kenneth Knowles > >> > >> <[email protected] <mailto:[email protected]> > >> > >> >>> > <mailto:[email protected] <mailto:[email protected]>>> wrote: > >> > >> >>> > > >> > >> >>> > You two did so much verification I had a hard time > >> > >> finding something > >> > >> >>> > where my help was meaningful! :-) > >> > >> >>> > > >> > >> >>> > I did run the Nexmark suite on the DirectRunner against > >> > >> 2.7.0 and > >> > >> >>> > 2.8.0 following > >> > >> >>> > > >> > >> > https://beam.apache.org/documentation/sdks/java/nexmark/#running-smoke-suite-on-the-directrunner-local > >> > >> >>> > > >> > >> < > https://beam.apache.org/documentation/sdks/java/nexmark/#running-smoke-suite-on-the-directrunner-local > >. > >> > >> >>> > > >> > >> >>> > It is admittedly a very silly test - the instructions > leave > >> > >> >>> > immutability enforcement on, etc. But it does appear that > >> > >> there is a > >> > >> >>> > 30% degradation in query 8 and 15% in query 9. These are > >> > >> the pure > >> > >> >>> > Java tests, not the SQL variants. The rest of the queries > >> > >> are close > >> > >> >>> > enough that differences are not meaningful. > >> > >> >>> > > >> > >> >>> > > >> > >> >>> > (It would be a good improvement for us to have alerts on > daily > >> > >> >>> > benchmarks if we do not have such a concept already.) > >> > >> >>> > > >> > >> >>> > > >> > >> >>> > I would ask a little more time to see what is going on > >> > >> here - is it > >> > >> >>> > a real performance issue or an artifact of how the tests > are > >> > >> >>> > invoked, or ...? > >> > >> >>> > > >> > >> >>> > > >> > >> >>> > Thank you! Much appreciated. Please let us know when you are > >> > >> done with > >> > >> >>> > your investigation. > >> > >> >>> > > >> > >> >>> > > >> > >> >>> > Kenn > >> > >> >>> > > >> > >> >>> > On Mon, Oct 22, 2018 at 6:20 PM Ahmet Altay > >> > >> <[email protected] <mailto:[email protected]> > >> > >> >>> > <mailto:[email protected] <mailto:[email protected]>>> > wrote: > >> > >> >>> > > >> > >> >>> > Hi all, > >> > >> >>> > > >> > >> >>> > Did you have a chance to review this RC? Between me > >> > >> and Robert > >> > >> >>> > we ran a significant chunk of the validations. Let me > >> > >> know if > >> > >> >>> > you have any questions. > >> > >> >>> > > >> > >> >>> > Ahmet > >> > >> >>> > > >> > >> >>> > On Thu, Oct 18, 2018 at 5:26 PM, Ahmet Altay > >> > >> <[email protected] <mailto:[email protected]> > >> > >> >>> > <mailto:[email protected] <mailto:[email protected]>>> > >> > >> wrote: > >> > >> >>> > > >> > >> >>> > Hi everyone, > >> > >> >>> > > >> > >> >>> > Please review and vote on the release candidate > >> > >> #1 for the > >> > >> >>> > version 2.8.0, as follows: > >> > >> >>> > [ ] +1, Approve the release > >> > >> >>> > [ ] -1, Do not approve the release (please > >> > >> provide specific > >> > >> >>> > comments) > >> > >> >>> > > >> > >> >>> > The complete staging area is available for your > >> > >> review, > >> > >> >>> > which includes: > >> > >> >>> > * JIRA release notes [1], > >> > >> >>> > * the official Apache source release to be > >> > >> deployed to > >> > >> >>> > dist.apache.org <http://dist.apache.org> > >> > >> <http://dist.apache.org> [2], which is > >> > >> >>> > signed with the key with fingerprint 6096FA00 > [3], > >> > >> >>> > * all artifacts to be deployed to the Maven > Central > >> > >> >>> > Repository [4], > >> > >> >>> > * source code tag "v2.8.0-RC1" [5], > >> > >> >>> > * website pull request listing the release and > >> > >> publishing > >> > >> >>> > the API reference manual [6]. > >> > >> >>> > * Python artifacts are deployed along with the > source > >> > >> >>> > release to the dist.apache.org > >> > >> <http://dist.apache.org> <http://dist.apache.org> [2]. > >> > >> >>> > * Validation sheet with a tab for 2.8.0 release > >> > >> to help with > >> > >> >>> > validation [7]. > >> > >> >>> > > >> > >> >>> > The vote will be open for at least 72 hours. It > >> > >> is adopted > >> > >> >>> > by majority approval, with at least 3 PMC > >> > >> affirmative votes. > >> > >> >>> > > >> > >> >>> > Thanks, > >> > >> >>> > Ahmet > >> > >> >>> > > >> > >> >>> > [1] > >> > >> >>> > > >> > >> > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&version=12343985 > >> > >> >>> > > >> > >> < > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&version=12343985 > > > >> > >> >>> > [2] > https://dist.apache.org/repos/dist/dev/beam/2.8.0 > >> > >> >>> > < > https://dist.apache.org/repos/dist/dev/beam/2.8.0> > >> > >> >>> > [3] > https://dist.apache.org/repos/dist/dev/beam/KEYS > >> > >> >>> > < > https://dist.apache.org/repos/dist/dev/beam/KEYS> > >> > >> >>> > [4] > >> > >> >>> > > >> > >> > https://repository.apache.org/content/repositories/orgapachebeam-1049/ > >> > >> >>> > > >> > >> < > https://repository.apache.org/content/repositories/orgapachebeam-1049/> > >> > >> >>> > [5] > https://github.com/apache/beam/tree/v2.8.0-RC1 > >> > >> >>> > <https://github.com/apache/beam/tree/v2.8.0-RC1> > >> > >> >>> > [6] https://github.com/apache/beam-site/pull/583 > >> > >> >>> > <https://github.com/apache/beam-site/pull/583> > and > >> > >> >>> > https://github.com/apache/beam/pull/6745 > >> > >> >>> > <https://github.com/apache/beam/pull/6745> > >> > >> >>> > [7] > >> > >> >>> > > >> > >> > https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1854712816 > >> > >> >>> > > >> > >> < > https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1854712816 > > > >> > >> >>> > > >> > >> >>> > > >> > >> >>> > > >> > >> >
