On Mon, Dec 9, 2019 at 6:34 PM Udi Meiri <eh...@google.com> wrote:

> Valentyn, the speedup is due to parallelization.
>
> On Mon, Dec 9, 2019 at 6:12 PM Chad Dombrova <chad...@gmail.com> wrote:
>
>>
>> On Mon, Dec 9, 2019 at 5:36 PM Udi Meiri <eh...@google.com> wrote:
>>
>>> I have given this some thought honestly don't know if splitting into
>>> separate jobs will help.
>>> - I have seen race conditions with running setuptools in parallel, so
>>> more isolation is better.
>>>
>>
>> What race conditions have you seen?  I think if we're doing things right,
>> this should not be happening, but I don't think we're doing things right.
>> One thing that I've noticed is that we're building into the source
>> directory, but I also think we're also doing weird things like trying to
>> copy the source directory beforehand.  I really think this system is
>> tripping over many non-standard choices that have been made along the way.
>> I have never these sorts of problems with in unittests that use tox, even
>> when many are running in parallel.  I got pulled away from it, but I'm
>> really hoping to address these issues here:
>> https://github.com/apache/beam/pull/10038.
>>
>
> This comment
> <https://issues.apache.org/jira/browse/BEAM-8481?focusedCommentId=16988369&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16988369>
> summarizes what I believe may be the issue (setuptools races).
>
> I believe copying the source directory was done in an effort to isolate
> the parallel builds (setuptools, cythonize).
>

Peanut gallery: containerized Jenkins builds seem like they would help, and
they are the current recommended best practice, but we are not there yet.
Agree/disagree?

What benefits do you see from splitting up the jobs?
>>>
>>
>> The biggest problem is that the jobs are doing too much and take too
>> long.  This simple fact compounds all of the other problems.  It seems
>> pretty obvious that we need to do less in each job, as long as the sum of
>> all of these smaller jobs is not substantially longer than the one
>> monolithic job.
>>
>
For some reason I keep forgetting the answer to this question: are we
caching pypi immutable artifacts on every Jenkins worker?


>
>> Benefits:
>>
>> - failures specific to a particular python version will be easier to spot
>> in the jenkins error summary, and cheaper to re-queue.  right now the
>> jenkins report mushes all of the failures together in a way that makes it
>> nearly impossible to tell which python version they correspond to.  only
>> the gradle scan gives you this insight, but it doesn't break the errors by
>> test.
>>
>
> I agree Jenkins handles duplicate test names pretty badly (reloading will
> periodically give you a different result).
>

Saw this in Java too w/ ValidatesRunner suites when they ran in one Jenkins
job. Worthwhile to avoid.

Kenn


> With pytest I've been able to set the suite name so that should help with
> identification. (I need to add pytest*.xml collection to the Jenkins job
> first)
>
>
>> - failures common to all python versions will be reported to the user
>> earlier, at which point they can cancel the other jobs if desired.  *this
>> is by far the biggest benefit. * why wait for 2 hours to see the same
>> failure reported for 5 versions of python?  if that had run on one version
>> of python I could maybe see that error in 30 minutes (while potentially
>> other python versions waited in the queue).  Repeat for each change pushed.
>> - flaky jobs will be cheaper to requeue (since it will affect a
>> smaller/shorter job)
>> - if xdist is giving us the parallel boost we're hoping for we should get
>> under the 2 hour mark every time
>>
>> Basically we're talking about getting feedback to users faster.
>>
>
> +1
>
>
>>
>> I really don't mind pasting a few more phrases if it means faster
>> feedback.
>>
>> -chad
>>
>>
>>
>>
>>>
>>> On Mon, Dec 9, 2019 at 4:17 PM Chad Dombrova <chad...@gmail.com> wrote:
>>>
>>>> After this PR goes in should we revisit breaking up the python tests
>>>> into separate jenkins jobs by python version?  One of the problems with
>>>> that plan originally was that we lost the parallelism that gradle provides
>>>> because we were left with only one tox task per jenkins job, and so the
>>>> total time to complete all python jenkins jobs went up a lot.  With
>>>> pytest + xdist we should hopefully be able to keep the parallelism even
>>>> with just one tox task.  This could be a big win.  I feel like I'm spending
>>>> more time monitoring and re-queuing timed-out jenkins jobs lately than I am
>>>> writing code.
>>>>
>>>> On Mon, Dec 9, 2019 at 10:32 AM Udi Meiri <eh...@google.com> wrote:
>>>>
>>>>> This PR <https://github.com/apache/beam/pull/10322> (in review)
>>>>> migrates py27-gcp to using pytest.
>>>>> It reduces the testPy2Gcp task down to ~13m
>>>>> <https://scans.gradle.com/s/kj7ogemnd3toe/timeline?details=ancsbov425524>
>>>>> (from ~45m). This speedup will probably be lower once all 8 tasks are 
>>>>> using
>>>>> pytest.
>>>>> It also adds 5 previously uncollected tests.
>>>>>
>>>>

Reply via email to