On Mon, Dec 9, 2019 at 6:34 PM Udi Meiri <eh...@google.com> wrote: > Valentyn, the speedup is due to parallelization. > > On Mon, Dec 9, 2019 at 6:12 PM Chad Dombrova <chad...@gmail.com> wrote: > >> >> On Mon, Dec 9, 2019 at 5:36 PM Udi Meiri <eh...@google.com> wrote: >> >>> I have given this some thought honestly don't know if splitting into >>> separate jobs will help. >>> - I have seen race conditions with running setuptools in parallel, so >>> more isolation is better. >>> >> >> What race conditions have you seen? I think if we're doing things right, >> this should not be happening, but I don't think we're doing things right. >> One thing that I've noticed is that we're building into the source >> directory, but I also think we're also doing weird things like trying to >> copy the source directory beforehand. I really think this system is >> tripping over many non-standard choices that have been made along the way. >> I have never these sorts of problems with in unittests that use tox, even >> when many are running in parallel. I got pulled away from it, but I'm >> really hoping to address these issues here: >> https://github.com/apache/beam/pull/10038. >> > > This comment > <https://issues.apache.org/jira/browse/BEAM-8481?focusedCommentId=16988369&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16988369> > summarizes what I believe may be the issue (setuptools races). > > I believe copying the source directory was done in an effort to isolate > the parallel builds (setuptools, cythonize). >
Peanut gallery: containerized Jenkins builds seem like they would help, and they are the current recommended best practice, but we are not there yet. Agree/disagree? What benefits do you see from splitting up the jobs? >>> >> >> The biggest problem is that the jobs are doing too much and take too >> long. This simple fact compounds all of the other problems. It seems >> pretty obvious that we need to do less in each job, as long as the sum of >> all of these smaller jobs is not substantially longer than the one >> monolithic job. >> > For some reason I keep forgetting the answer to this question: are we caching pypi immutable artifacts on every Jenkins worker? > >> Benefits: >> >> - failures specific to a particular python version will be easier to spot >> in the jenkins error summary, and cheaper to re-queue. right now the >> jenkins report mushes all of the failures together in a way that makes it >> nearly impossible to tell which python version they correspond to. only >> the gradle scan gives you this insight, but it doesn't break the errors by >> test. >> > > I agree Jenkins handles duplicate test names pretty badly (reloading will > periodically give you a different result). > Saw this in Java too w/ ValidatesRunner suites when they ran in one Jenkins job. Worthwhile to avoid. Kenn > With pytest I've been able to set the suite name so that should help with > identification. (I need to add pytest*.xml collection to the Jenkins job > first) > > >> - failures common to all python versions will be reported to the user >> earlier, at which point they can cancel the other jobs if desired. *this >> is by far the biggest benefit. * why wait for 2 hours to see the same >> failure reported for 5 versions of python? if that had run on one version >> of python I could maybe see that error in 30 minutes (while potentially >> other python versions waited in the queue). Repeat for each change pushed. >> - flaky jobs will be cheaper to requeue (since it will affect a >> smaller/shorter job) >> - if xdist is giving us the parallel boost we're hoping for we should get >> under the 2 hour mark every time >> >> Basically we're talking about getting feedback to users faster. >> > > +1 > > >> >> I really don't mind pasting a few more phrases if it means faster >> feedback. >> >> -chad >> >> >> >> >>> >>> On Mon, Dec 9, 2019 at 4:17 PM Chad Dombrova <chad...@gmail.com> wrote: >>> >>>> After this PR goes in should we revisit breaking up the python tests >>>> into separate jenkins jobs by python version? One of the problems with >>>> that plan originally was that we lost the parallelism that gradle provides >>>> because we were left with only one tox task per jenkins job, and so the >>>> total time to complete all python jenkins jobs went up a lot. With >>>> pytest + xdist we should hopefully be able to keep the parallelism even >>>> with just one tox task. This could be a big win. I feel like I'm spending >>>> more time monitoring and re-queuing timed-out jenkins jobs lately than I am >>>> writing code. >>>> >>>> On Mon, Dec 9, 2019 at 10:32 AM Udi Meiri <eh...@google.com> wrote: >>>> >>>>> This PR <https://github.com/apache/beam/pull/10322> (in review) >>>>> migrates py27-gcp to using pytest. >>>>> It reduces the testPy2Gcp task down to ~13m >>>>> <https://scans.gradle.com/s/kj7ogemnd3toe/timeline?details=ancsbov425524> >>>>> (from ~45m). This speedup will probably be lower once all 8 tasks are >>>>> using >>>>> pytest. >>>>> It also adds 5 previously uncollected tests. >>>>> >>>>