I didn't isolate it to a cause and commit, so that is extremely useful to know. To bring some details on thread:
query 4: a single aggregation in sliding windows query 8: a single join with no other interesting logic query 9 (prefix of query 6*): find the winning bid for each auction query 6: query 9 followed by a single aggregation Kenn * they seem out of order because the original queries were 1-8 and we added 9 later to benchmark the baseline without the aggregation On Mon, Oct 29, 2018 at 3:28 AM Etienne Chauchot <[email protected]> wrote: > Oops, just saw than Kenn already mentioned spark perf degradation on spark > runner around 10/05. Sorry for the repetition. > Nevertheless, IMHO, I think it will be still worth checking PR #6181. > > Etienne > > Le lundi 29 octobre 2018 à 10:42 +0100, Etienne Chauchot a écrit : > > Hey, > I would vote -0 : here is the explanation: > > I took a look at Nexmark dashboards for output size and performance for > all the runners in all the modes around the date of the release cut to > search for regressions. > > I noted a regression on the performance of the spark runner. Query4, > Query6, Query8 and Query9 running times were multiplied by 2 to 3 around > the date of 10/05/18. See > https://apache-beam-testing.appspot.com/explore?dashboard=5138380291571712 > So I searched in the commit history of the spark runner module for what > happened around 10/05/18. And I found this commit > > e4a1ccbaa10808d88c6ad2a687fe9f6d52392d90: Merge pull request #6181: > [BEAM-4783] Add bundleSize for splitting BoundedSources > > I don't know if it should be considered a blocker but we should definitely > take another look at pull request #6181 that seems to change the way we > split on spark runner. > > Best > Etienne > > > Le vendredi 26 octobre 2018 à 18:20 +0200, Maximilian Michels a écrit : > > +1 (binding) > > > On 26.10.18 17:45, Kenneth Knowles wrote: > > Nice. Thanks. > > > +1 > > > > On Fri, Oct 26, 2018 at 8:44 AM Robert Bradshaw <[email protected] > > <mailto:[email protected]>> wrote: > > > Thanks Tim! > > > This was my only hesitation, and sounds like we're in the clear here. > > > +1 (binding) > > On Fri, Oct 26, 2018 at 5:05 PM Tim Robertson > > <[email protected] <mailto:[email protected]>> wrote: > > > > > > A colleague and I tested on 2.7.0 and 2.8.0RC1: > > > > > > 1. Quickstart on Spark/YARN/HDFS (CDH 5.12.0) (commented in > > spreadsheet) > > > 2. Our Avro to Avro pipelines on Spark/YARN/HDFS (note we > > backport the un-merged BEAM-5036 fix in our code) > > > 3. Our Avro to Elasticsearch pipelines on Spark/YARN/HDFS > > > > > > Everything worked, and performance was similar on both. > > > We built using maven pointing at > > https://repository.apache.org/content/repositories/orgapachebeam-1049/ > > > > > > Based on this limited testing: +1 > > > > > > Thank you to the release managers, > > > Tim > > > > > > > > > On Thu, Oct 25, 2018 at 7:21 PM Tim <[email protected] > > <mailto:[email protected]>> wrote: > > >> > > >> I can do some tests on Spark / YARN tomorrow (CEST timezone). > > Sorry I’ve just been too busy to assist. > > >> > > >> Tim > > >> > > >> On 25 Oct 2018, at 18:59, Kenneth Knowles <[email protected] > > <mailto:[email protected]>> wrote: > > >> > > >> I tried to do a more thorough job on this. > > >> > > >> - I could not reproduce the slowdown in Query 9. I believe the > > variance was simply high given the parameters and environment > > >> - I saw the same slowdown in Query 8 when running as part of > > the suite, but it vanished when I ran repeatedly on its own, so > > again it is not good methodology probably > > >> > > >> We do have the dashboard at > > https://apache-beam-testing.appspot.com/dashboard-admin though no > > anomaly detection set up AFAIK. > > >> > > >> - There is no issue easily visible in DirectRunner: > > https://apache-beam-testing.appspot.com/explore?dashboard=5084698770407424 > > >> - There is a notable degradation in Spark runner on 10/5 for > > many queries. > > https://apache-beam-testing.appspot.com/explore?dashboard=5138380291571712 > > >> - Something minor happened for Dataflow around 10/1: > > https://apache-beam-testing.appspot.com/explore?dashboard=5670405876482048 > > >> - Flink runner seems to have had some fantastic improvements > > :-) > > https://apache-beam-testing.appspot.com/explore?dashboard=5699257587728384 > > >> > > >> So if there is a blocker it would really be the Spark runner > > perf changes. Of course, all these except Dataflow are using local > > instances so may not be representative of larger scale AFAIK. > > >> > > >> Kenn > > >> > > >> On Wed, Oct 24, 2018 at 9:48 AM Maximilian Michels > > <[email protected] <mailto:[email protected]>> wrote: > > >>> > > >>> I've run WordCount using Quickstart with the FlinkRunner > > (locally and > > >>> against a Flink cluster). > > >>> > > >>> Would give a +1 but waiting what Kenn finds. > > >>> > > >>> -Max > > >>> > > >>> On 23.10.18 07:11, Ahmet Altay wrote: > > >>> > > > >>> > > > >>> > On Mon, Oct 22, 2018 at 10:06 PM, Kenneth Knowles > > <[email protected] <mailto:[email protected]> > > >>> > <mailto:[email protected] <mailto:[email protected]>>> wrote: > > >>> > > > >>> > You two did so much verification I had a hard time > > finding something > > >>> > where my help was meaningful! :-) > > >>> > > > >>> > I did run the Nexmark suite on the DirectRunner against > > 2.7.0 and > > >>> > 2.8.0 following > > >>> > > > > https://beam.apache.org/documentation/sdks/java/nexmark/#running-smoke-suite-on-the-directrunner-local > > >>> > > > > <https://beam.apache.org/documentation/sdks/java/nexmark/#running-smoke-suite-on-the-directrunner-local>. > > >>> > > > >>> > It is admittedly a very silly test - the instructions leave > > >>> > immutability enforcement on, etc. But it does appear that > > there is a > > >>> > 30% degradation in query 8 and 15% in query 9. These are > > the pure > > >>> > Java tests, not the SQL variants. The rest of the queries > > are close > > >>> > enough that differences are not meaningful. > > >>> > > > >>> > > > >>> > (It would be a good improvement for us to have alerts on daily > > >>> > benchmarks if we do not have such a concept already.) > > >>> > > > >>> > > > >>> > I would ask a little more time to see what is going on > > here - is it > > >>> > a real performance issue or an artifact of how the tests are > > >>> > invoked, or ...? > > >>> > > > >>> > > > >>> > Thank you! Much appreciated. Please let us know when you are > > done with > > >>> > your investigation. > > >>> > > > >>> > > > >>> > Kenn > > >>> > > > >>> > On Mon, Oct 22, 2018 at 6:20 PM Ahmet Altay > > <[email protected] <mailto:[email protected]> > > >>> > <mailto:[email protected] <mailto:[email protected]>>> wrote: > > >>> > > > >>> > Hi all, > > >>> > > > >>> > Did you have a chance to review this RC? Between me > > and Robert > > >>> > we ran a significant chunk of the validations. Let me > > know if > > >>> > you have any questions. > > >>> > > > >>> > Ahmet > > >>> > > > >>> > On Thu, Oct 18, 2018 at 5:26 PM, Ahmet Altay > > <[email protected] <mailto:[email protected]> > > >>> > <mailto:[email protected] <mailto:[email protected]>>> > > wrote: > > >>> > > > >>> > Hi everyone, > > >>> > > > >>> > Please review and vote on the release candidate > > #1 for the > > >>> > version 2.8.0, as follows: > > >>> > [ ] +1, Approve the release > > >>> > [ ] -1, Do not approve the release (please > > provide specific > > >>> > comments) > > >>> > > > >>> > The complete staging area is available for your > > review, > > >>> > which includes: > > >>> > * JIRA release notes [1], > > >>> > * the official Apache source release to be > > deployed to > > >>> > dist.apache.org <http://dist.apache.org> > > <http://dist.apache.org> [2], which is > > >>> > signed with the key with fingerprint 6096FA00 [3], > > >>> > * all artifacts to be deployed to the Maven Central > > >>> > Repository [4], > > >>> > * source code tag "v2.8.0-RC1" [5], > > >>> > * website pull request listing the release and > > publishing > > >>> > the API reference manual [6]. > > >>> > * Python artifacts are deployed along with the source > > >>> > release to the dist.apache.org > > <http://dist.apache.org> <http://dist.apache.org> [2]. > > >>> > * Validation sheet with a tab for 2.8.0 release > > to help with > > >>> > validation [7]. > > >>> > > > >>> > The vote will be open for at least 72 hours. It > > is adopted > > >>> > by majority approval, with at least 3 PMC > > affirmative votes. > > >>> > > > >>> > Thanks, > > >>> > Ahmet > > >>> > > > >>> > [1] > > >>> > > > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&version=12343985 > > >>> > > > > <https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&version=12343985> > > >>> > [2] https://dist.apache.org/repos/dist/dev/beam/2.8.0 > > >>> > <https://dist.apache.org/repos/dist/dev/beam/2.8.0> > > >>> > [3] https://dist.apache.org/repos/dist/dev/beam/KEYS > > >>> > <https://dist.apache.org/repos/dist/dev/beam/KEYS> > > >>> > [4] > > >>> > > > https://repository.apache.org/content/repositories/orgapachebeam-1049/ > > >>> > > > <https://repository.apache.org/content/repositories/orgapachebeam-1049/> > > >>> > [5] https://github.com/apache/beam/tree/v2.8.0-RC1 > > >>> > <https://github.com/apache/beam/tree/v2.8.0-RC1> > > >>> > [6] https://github.com/apache/beam-site/pull/583 > > >>> > <https://github.com/apache/beam-site/pull/583> and > > >>> > https://github.com/apache/beam/pull/6745 > > >>> > <https://github.com/apache/beam/pull/6745> > > >>> > [7] > > >>> > > > > https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1854712816 > > >>> > > > > <https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=1854712816> > > >>> > > > >>> > > > >>> > > > >
