Looking at the dates on the Spark runner git log there was a PR merged to change Spark translation from classes to URNs. I cannot see how this can impact performance. Looking at the other queries in the dashboards, there seems to be a great variability in the executions of the Spark runner to the point of feeling we don't have guarantees anymore. I wonder if this was because of other loads shared in the server(s), or because our sample is too small for the standard deviation.
I would proceed with the release, the real question is if we can somehow constraint the execution of this tests to have a more consistent output. On Fri, Dec 7, 2018 at 4:10 PM Etienne Chauchot <echauc...@apache.org> wrote: > Hi all, > Regarding query7 in spark: > - there doesn't seem to be a functional regression: query passes and > output size is still the same > > - Also the performance degradation seems to be only on spark, the other > runners do not seem to suffer from it. > > - performance degradation seems to be constant from 11/12 so we can > eliminate temporary load on the jenkins server that would generate delays > in Max transform. > > => query7 uses Max transform, fanout and side inputs, has one of these > parts recently (11/12/18) changed in spark? > > Etienne > > Le jeudi 06 décembre 2018 à 21:32 -0800, Chamikara Jayalath a écrit : > > Udi or anybody else who is familiar about Nexmark, please -1 the vote > thread if you think this particular performance regression for Spark/Direct > runners is a blocker. Otherwise I think we can continue the vote. > > Thanks, > Cham > > On Thu, Dec 6, 2018 at 6:19 PM Chamikara Jayalath <chamik...@google.com> > wrote: > > Are either of these regressions due to known issues ? If not should they > be considered release blockers ? > > Thanks, > Cham > > On Thu, Dec 6, 2018 at 6:11 PM Udi Meiri <eh...@google.com> wrote: > > For DirectRunner there are regressions in query 7 sql direct runner batch > mode > <https://apache-beam-testing.appspot.com/explore?dashboard=5084698770407424&widget=732741424&container=411089194> > (2x) > and streaming mode (5x). > > > On Thu, Dec 6, 2018 at 5:59 PM Udi Meiri <eh...@google.com> wrote: > > I see a regression for query 7 spark runner batch mode > <https://apache-beam-testing.appspot.com/explore?dashboard=5138380291571712&widget=1782465104&container=462502368> > on > about 2018-11-13. > [image: image.png] > > On Thu, Dec 6, 2018 at 2:46 AM Chamikara Jayalath <chamik...@google.com> > wrote: > > Hi everyone, > > Please review and vote on the release candidate #1 for the version 2.9.0, > as follows: > [ ] +1, Approve the release > [ ] -1, Do not approve the release (please provide specific comments) > > > The complete staging area is available for your review, which includes: > * JIRA release notes [1], > * the official Apache source release to be deployed to dist.apache.org > [2], which is signed with the key with fingerprint EEAC70DF3D0BC23B [3], > * all artifacts to be deployed to the Maven Central Repository [4], > * source code tag "v2.9.0-RC1" [5], > * website pull request listing the release [6] and publishing the API > reference manual [7]. > * Python artifacts are deployed along with the source release to the > dist.apache.org [2]. > * Validation sheet with a tab for 2.9.0 release to help with validation > [7]. > > The vote will be open for at least 72 hours. It is adopted by majority > approval, with at least 3 PMC affirmative votes. > > Thanks, > Cham > > [1] > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&version=12344258 > [2] https://dist.apache.org/repos/dist/dev/beam/2.9.0/ > [3] https://dist.apache.org/repos/dist/release/beam/KEYS > [4] https://repository.apache.org/content/repositories/orgapachebeam-1054/ > [5] https://github.com/apache/beam/tree/v2.9.0-RC1 > [6] https://github.com/apache/beam/pull/7215 > [7] https://github.com/apache/beam-site/pull/584 > [8] > https://docs.google.com/spreadsheets/d/1qk-N5vjXvbcEk68GjbkSZTR8AGqyNUM-oLFo_ZXBpJw/edit#gid=2053422529 > >