Valentyn, the speedup is due to parallelization.

On Mon, Dec 9, 2019 at 6:12 PM Chad Dombrova <chad...@gmail.com> wrote:

>
> On Mon, Dec 9, 2019 at 5:36 PM Udi Meiri <eh...@google.com> wrote:
>
>> I have given this some thought honestly don't know if splitting into
>> separate jobs will help.
>> - I have seen race conditions with running setuptools in parallel, so
>> more isolation is better.
>>
>
> What race conditions have you seen?  I think if we're doing things right,
> this should not be happening, but I don't think we're doing things right.
> One thing that I've noticed is that we're building into the source
> directory, but I also think we're also doing weird things like trying to
> copy the source directory beforehand.  I really think this system is
> tripping over many non-standard choices that have been made along the way.
> I have never these sorts of problems with in unittests that use tox, even
> when many are running in parallel.  I got pulled away from it, but I'm
> really hoping to address these issues here:
> https://github.com/apache/beam/pull/10038.
>

This comment
<https://issues.apache.org/jira/browse/BEAM-8481?focusedCommentId=16988369&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16988369>
summarizes what I believe may be the issue (setuptools races).

I believe copying the source directory was done in an effort to isolate the
parallel builds (setuptools, cythonize).


>
>> What benefits do you see from splitting up the jobs?
>>
>
> The biggest problem is that the jobs are doing too much and take too
> long.  This simple fact compounds all of the other problems.  It seems
> pretty obvious that we need to do less in each job, as long as the sum of
> all of these smaller jobs is not substantially longer than the one
> monolithic job.
>
> Benefits:
>
> - failures specific to a particular python version will be easier to spot
> in the jenkins error summary, and cheaper to re-queue.  right now the
> jenkins report mushes all of the failures together in a way that makes it
> nearly impossible to tell which python version they correspond to.  only
> the gradle scan gives you this insight, but it doesn't break the errors by
> test.
>

I agree Jenkins handles duplicate test names pretty badly (reloading will
periodically give you a different result).
With pytest I've been able to set the suite name so that should help with
identification. (I need to add pytest*.xml collection to the Jenkins job
first)


> - failures common to all python versions will be reported to the user
> earlier, at which point they can cancel the other jobs if desired.  *this
> is by far the biggest benefit. * why wait for 2 hours to see the same
> failure reported for 5 versions of python?  if that had run on one version
> of python I could maybe see that error in 30 minutes (while potentially
> other python versions waited in the queue).  Repeat for each change pushed.
> - flaky jobs will be cheaper to requeue (since it will affect a
> smaller/shorter job)
> - if xdist is giving us the parallel boost we're hoping for we should get
> under the 2 hour mark every time
>
> Basically we're talking about getting feedback to users faster.
>

+1


>
> I really don't mind pasting a few more phrases if it means faster feedback.
>
> -chad
>
>
>
>
>>
>> On Mon, Dec 9, 2019 at 4:17 PM Chad Dombrova <chad...@gmail.com> wrote:
>>
>>> After this PR goes in should we revisit breaking up the python tests
>>> into separate jenkins jobs by python version?  One of the problems with
>>> that plan originally was that we lost the parallelism that gradle provides
>>> because we were left with only one tox task per jenkins job, and so the
>>> total time to complete all python jenkins jobs went up a lot.  With
>>> pytest + xdist we should hopefully be able to keep the parallelism even
>>> with just one tox task.  This could be a big win.  I feel like I'm spending
>>> more time monitoring and re-queuing timed-out jenkins jobs lately than I am
>>> writing code.
>>>
>>> On Mon, Dec 9, 2019 at 10:32 AM Udi Meiri <eh...@google.com> wrote:
>>>
>>>> This PR <https://github.com/apache/beam/pull/10322> (in review)
>>>> migrates py27-gcp to using pytest.
>>>> It reduces the testPy2Gcp task down to ~13m
>>>> <https://scans.gradle.com/s/kj7ogemnd3toe/timeline?details=ancsbov425524>
>>>> (from ~45m). This speedup will probably be lower once all 8 tasks are using
>>>> pytest.
>>>> It also adds 5 previously uncollected tests.
>>>>
>>>

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to