Update: 37m <https://scans.gradle.com/s/hqcvbxm2h6svg/timeline> precommit time with the latest PR <https://github.com/apache/beam/pull/10377> (in review).
On Tue, Dec 10, 2019 at 11:21 AM Udi Meiri <eh...@google.com> wrote: > > > On Mon, Dec 9, 2019 at 9:33 PM Kenneth Knowles <k...@apache.org> wrote: > >> >> >> On Mon, Dec 9, 2019 at 6:34 PM Udi Meiri <eh...@google.com> wrote: >> >>> Valentyn, the speedup is due to parallelization. >>> >>> On Mon, Dec 9, 2019 at 6:12 PM Chad Dombrova <chad...@gmail.com> wrote: >>> >>>> >>>> On Mon, Dec 9, 2019 at 5:36 PM Udi Meiri <eh...@google.com> wrote: >>>> >>>>> I have given this some thought honestly don't know if splitting into >>>>> separate jobs will help. >>>>> - I have seen race conditions with running setuptools in parallel, so >>>>> more isolation is better. >>>>> >>>> >>>> What race conditions have you seen? I think if we're doing things >>>> right, this should not be happening, but I don't think we're doing things >>>> right. One thing that I've noticed is that we're building into the source >>>> directory, but I also think we're also doing weird things like trying to >>>> copy the source directory beforehand. I really think this system is >>>> tripping over many non-standard choices that have been made along the way. >>>> I have never these sorts of problems with in unittests that use tox, even >>>> when many are running in parallel. I got pulled away from it, but I'm >>>> really hoping to address these issues here: >>>> https://github.com/apache/beam/pull/10038. >>>> >>> >>> This comment >>> <https://issues.apache.org/jira/browse/BEAM-8481?focusedCommentId=16988369&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16988369> >>> summarizes what I believe may be the issue (setuptools races). >>> >>> I believe copying the source directory was done in an effort to isolate >>> the parallel builds (setuptools, cythonize). >>> >> >> Peanut gallery: containerized Jenkins builds seem like they would help, >> and they are the current recommended best practice, but we are not there >> yet. Agree/disagree? >> > > I'm okay with containerized Jenkins builds as long as using pytest/tox > directly still works. > > >> >> What benefits do you see from splitting up the jobs? >>>>> >>>> >>>> The biggest problem is that the jobs are doing too much and take too >>>> long. This simple fact compounds all of the other problems. It seems >>>> pretty obvious that we need to do less in each job, as long as the sum of >>>> all of these smaller jobs is not substantially longer than the one >>>> monolithic job. >>>> >>> >> For some reason I keep forgetting the answer to this question: are we >> caching pypi immutable artifacts on every Jenkins worker? >> > > I don't know. > > >> >>> >>>> Benefits: >>>> >>>> - failures specific to a particular python version will be easier to >>>> spot in the jenkins error summary, and cheaper to re-queue. right now the >>>> jenkins report mushes all of the failures together in a way that makes it >>>> nearly impossible to tell which python version they correspond to. only >>>> the gradle scan gives you this insight, but it doesn't break the errors by >>>> test. >>>> >>> >>> I agree Jenkins handles duplicate test names pretty badly (reloading >>> will periodically give you a different result). >>> >> >> Saw this in Java too w/ ValidatesRunner suites when they ran in one >> Jenkins job. Worthwhile to avoid. >> >> Kenn >> >> >>> With pytest I've been able to set the suite name so that should help >>> with identification. (I need to add pytest*.xml collection to the Jenkins >>> job first) >>> >>> >>>> - failures common to all python versions will be reported to the user >>>> earlier, at which point they can cancel the other jobs if desired. *this >>>> is by far the biggest benefit. * why wait for 2 hours to see the same >>>> failure reported for 5 versions of python? if that had run on one version >>>> of python I could maybe see that error in 30 minutes (while potentially >>>> other python versions waited in the queue). Repeat for each change pushed. >>>> - flaky jobs will be cheaper to requeue (since it will affect a >>>> smaller/shorter job) >>>> - if xdist is giving us the parallel boost we're hoping for we should >>>> get under the 2 hour mark every time >>>> >>>> Basically we're talking about getting feedback to users faster. >>>> >>> >>> +1 >>> >>> >>>> >>>> I really don't mind pasting a few more phrases if it means faster >>>> feedback. >>>> >>>> -chad >>>> >>>> >>>> >>>> >>>>> >>>>> On Mon, Dec 9, 2019 at 4:17 PM Chad Dombrova <chad...@gmail.com> >>>>> wrote: >>>>> >>>>>> After this PR goes in should we revisit breaking up the python tests >>>>>> into separate jenkins jobs by python version? One of the problems with >>>>>> that plan originally was that we lost the parallelism that gradle >>>>>> provides >>>>>> because we were left with only one tox task per jenkins job, and so the >>>>>> total time to complete all python jenkins jobs went up a lot. With >>>>>> pytest + xdist we should hopefully be able to keep the parallelism even >>>>>> with just one tox task. This could be a big win. I feel like I'm >>>>>> spending >>>>>> more time monitoring and re-queuing timed-out jenkins jobs lately than I >>>>>> am >>>>>> writing code. >>>>>> >>>>>> On Mon, Dec 9, 2019 at 10:32 AM Udi Meiri <eh...@google.com> wrote: >>>>>> >>>>>>> This PR <https://github.com/apache/beam/pull/10322> (in review) >>>>>>> migrates py27-gcp to using pytest. >>>>>>> It reduces the testPy2Gcp task down to ~13m >>>>>>> <https://scans.gradle.com/s/kj7ogemnd3toe/timeline?details=ancsbov425524> >>>>>>> (from ~45m). This speedup will probably be lower once all 8 tasks are >>>>>>> using >>>>>>> pytest. >>>>>>> It also adds 5 previously uncollected tests. >>>>>>> >>>>>>
smime.p7s
Description: S/MIME Cryptographic Signature