On Mon, Dec 9, 2019 at 9:33 PM Kenneth Knowles <k...@apache.org> wrote:
> > > On Mon, Dec 9, 2019 at 6:34 PM Udi Meiri <eh...@google.com> wrote: > >> Valentyn, the speedup is due to parallelization. >> >> On Mon, Dec 9, 2019 at 6:12 PM Chad Dombrova <chad...@gmail.com> wrote: >> >>> >>> On Mon, Dec 9, 2019 at 5:36 PM Udi Meiri <eh...@google.com> wrote: >>> >>>> I have given this some thought honestly don't know if splitting into >>>> separate jobs will help. >>>> - I have seen race conditions with running setuptools in parallel, so >>>> more isolation is better. >>>> >>> >>> What race conditions have you seen? I think if we're doing things >>> right, this should not be happening, but I don't think we're doing things >>> right. One thing that I've noticed is that we're building into the source >>> directory, but I also think we're also doing weird things like trying to >>> copy the source directory beforehand. I really think this system is >>> tripping over many non-standard choices that have been made along the way. >>> I have never these sorts of problems with in unittests that use tox, even >>> when many are running in parallel. I got pulled away from it, but I'm >>> really hoping to address these issues here: >>> https://github.com/apache/beam/pull/10038. >>> >> >> This comment >> <https://issues.apache.org/jira/browse/BEAM-8481?focusedCommentId=16988369&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16988369> >> summarizes what I believe may be the issue (setuptools races). >> >> I believe copying the source directory was done in an effort to isolate >> the parallel builds (setuptools, cythonize). >> > > Peanut gallery: containerized Jenkins builds seem like they would help, > and they are the current recommended best practice, but we are not there > yet. Agree/disagree? > I'm okay with containerized Jenkins builds as long as using pytest/tox directly still works. > > What benefits do you see from splitting up the jobs? >>>> >>> >>> The biggest problem is that the jobs are doing too much and take too >>> long. This simple fact compounds all of the other problems. It seems >>> pretty obvious that we need to do less in each job, as long as the sum of >>> all of these smaller jobs is not substantially longer than the one >>> monolithic job. >>> >> > For some reason I keep forgetting the answer to this question: are we > caching pypi immutable artifacts on every Jenkins worker? > I don't know. > >> >>> Benefits: >>> >>> - failures specific to a particular python version will be easier to >>> spot in the jenkins error summary, and cheaper to re-queue. right now the >>> jenkins report mushes all of the failures together in a way that makes it >>> nearly impossible to tell which python version they correspond to. only >>> the gradle scan gives you this insight, but it doesn't break the errors by >>> test. >>> >> >> I agree Jenkins handles duplicate test names pretty badly (reloading will >> periodically give you a different result). >> > > Saw this in Java too w/ ValidatesRunner suites when they ran in one > Jenkins job. Worthwhile to avoid. > > Kenn > > >> With pytest I've been able to set the suite name so that should help with >> identification. (I need to add pytest*.xml collection to the Jenkins job >> first) >> >> >>> - failures common to all python versions will be reported to the user >>> earlier, at which point they can cancel the other jobs if desired. *this >>> is by far the biggest benefit. * why wait for 2 hours to see the same >>> failure reported for 5 versions of python? if that had run on one version >>> of python I could maybe see that error in 30 minutes (while potentially >>> other python versions waited in the queue). Repeat for each change pushed. >>> - flaky jobs will be cheaper to requeue (since it will affect a >>> smaller/shorter job) >>> - if xdist is giving us the parallel boost we're hoping for we should >>> get under the 2 hour mark every time >>> >>> Basically we're talking about getting feedback to users faster. >>> >> >> +1 >> >> >>> >>> I really don't mind pasting a few more phrases if it means faster >>> feedback. >>> >>> -chad >>> >>> >>> >>> >>>> >>>> On Mon, Dec 9, 2019 at 4:17 PM Chad Dombrova <chad...@gmail.com> wrote: >>>> >>>>> After this PR goes in should we revisit breaking up the python tests >>>>> into separate jenkins jobs by python version? One of the problems with >>>>> that plan originally was that we lost the parallelism that gradle provides >>>>> because we were left with only one tox task per jenkins job, and so the >>>>> total time to complete all python jenkins jobs went up a lot. With >>>>> pytest + xdist we should hopefully be able to keep the parallelism even >>>>> with just one tox task. This could be a big win. I feel like I'm >>>>> spending >>>>> more time monitoring and re-queuing timed-out jenkins jobs lately than I >>>>> am >>>>> writing code. >>>>> >>>>> On Mon, Dec 9, 2019 at 10:32 AM Udi Meiri <eh...@google.com> wrote: >>>>> >>>>>> This PR <https://github.com/apache/beam/pull/10322> (in review) >>>>>> migrates py27-gcp to using pytest. >>>>>> It reduces the testPy2Gcp task down to ~13m >>>>>> <https://scans.gradle.com/s/kj7ogemnd3toe/timeline?details=ancsbov425524> >>>>>> (from ~45m). This speedup will probably be lower once all 8 tasks are >>>>>> using >>>>>> pytest. >>>>>> It also adds 5 previously uncollected tests. >>>>>> >>>>>
smime.p7s
Description: S/MIME Cryptographic Signature