Update: 37m <https://scans.gradle.com/s/hqcvbxm2h6svg/timeline>
precommit time with the latest PR
<https://github.com/apache/beam/pull/10377> (in review).

On Tue, Dec 10, 2019 at 11:21 AM Udi Meiri <eh...@google.com> wrote:

>
>
> On Mon, Dec 9, 2019 at 9:33 PM Kenneth Knowles <k...@apache.org> wrote:
>
>>
>>
>> On Mon, Dec 9, 2019 at 6:34 PM Udi Meiri <eh...@google.com> wrote:
>>
>>> Valentyn, the speedup is due to parallelization.
>>>
>>> On Mon, Dec 9, 2019 at 6:12 PM Chad Dombrova <chad...@gmail.com> wrote:
>>>
>>>>
>>>> On Mon, Dec 9, 2019 at 5:36 PM Udi Meiri <eh...@google.com> wrote:
>>>>
>>>>> I have given this some thought honestly don't know if splitting into
>>>>> separate jobs will help.
>>>>> - I have seen race conditions with running setuptools in parallel, so
>>>>> more isolation is better.
>>>>>
>>>>
>>>> What race conditions have you seen?  I think if we're doing things
>>>> right, this should not be happening, but I don't think we're doing things
>>>> right. One thing that I've noticed is that we're building into the source
>>>> directory, but I also think we're also doing weird things like trying to
>>>> copy the source directory beforehand.  I really think this system is
>>>> tripping over many non-standard choices that have been made along the way.
>>>> I have never these sorts of problems with in unittests that use tox, even
>>>> when many are running in parallel.  I got pulled away from it, but I'm
>>>> really hoping to address these issues here:
>>>> https://github.com/apache/beam/pull/10038.
>>>>
>>>
>>> This comment
>>> <https://issues.apache.org/jira/browse/BEAM-8481?focusedCommentId=16988369&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16988369>
>>> summarizes what I believe may be the issue (setuptools races).
>>>
>>> I believe copying the source directory was done in an effort to isolate
>>> the parallel builds (setuptools, cythonize).
>>>
>>
>> Peanut gallery: containerized Jenkins builds seem like they would help,
>> and they are the current recommended best practice, but we are not there
>> yet. Agree/disagree?
>>
>
> I'm okay with containerized Jenkins builds as long as using pytest/tox
> directly still works.
>
>
>>
>> What benefits do you see from splitting up the jobs?
>>>>>
>>>>
>>>> The biggest problem is that the jobs are doing too much and take too
>>>> long.  This simple fact compounds all of the other problems.  It seems
>>>> pretty obvious that we need to do less in each job, as long as the sum of
>>>> all of these smaller jobs is not substantially longer than the one
>>>> monolithic job.
>>>>
>>>
>> For some reason I keep forgetting the answer to this question: are we
>> caching pypi immutable artifacts on every Jenkins worker?
>>
>
> I don't know.
>
>
>>
>>>
>>>> Benefits:
>>>>
>>>> - failures specific to a particular python version will be easier to
>>>> spot in the jenkins error summary, and cheaper to re-queue.  right now the
>>>> jenkins report mushes all of the failures together in a way that makes it
>>>> nearly impossible to tell which python version they correspond to.  only
>>>> the gradle scan gives you this insight, but it doesn't break the errors by
>>>> test.
>>>>
>>>
>>> I agree Jenkins handles duplicate test names pretty badly (reloading
>>> will periodically give you a different result).
>>>
>>
>> Saw this in Java too w/ ValidatesRunner suites when they ran in one
>> Jenkins job. Worthwhile to avoid.
>>
>> Kenn
>>
>>
>>> With pytest I've been able to set the suite name so that should help
>>> with identification. (I need to add pytest*.xml collection to the Jenkins
>>> job first)
>>>
>>>
>>>> - failures common to all python versions will be reported to the user
>>>> earlier, at which point they can cancel the other jobs if desired.  *this
>>>> is by far the biggest benefit. * why wait for 2 hours to see the same
>>>> failure reported for 5 versions of python?  if that had run on one version
>>>> of python I could maybe see that error in 30 minutes (while potentially
>>>> other python versions waited in the queue).  Repeat for each change pushed.
>>>> - flaky jobs will be cheaper to requeue (since it will affect a
>>>> smaller/shorter job)
>>>> - if xdist is giving us the parallel boost we're hoping for we should
>>>> get under the 2 hour mark every time
>>>>
>>>> Basically we're talking about getting feedback to users faster.
>>>>
>>>
>>> +1
>>>
>>>
>>>>
>>>> I really don't mind pasting a few more phrases if it means faster
>>>> feedback.
>>>>
>>>> -chad
>>>>
>>>>
>>>>
>>>>
>>>>>
>>>>> On Mon, Dec 9, 2019 at 4:17 PM Chad Dombrova <chad...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> After this PR goes in should we revisit breaking up the python tests
>>>>>> into separate jenkins jobs by python version?  One of the problems with
>>>>>> that plan originally was that we lost the parallelism that gradle 
>>>>>> provides
>>>>>> because we were left with only one tox task per jenkins job, and so the
>>>>>> total time to complete all python jenkins jobs went up a lot.  With
>>>>>> pytest + xdist we should hopefully be able to keep the parallelism even
>>>>>> with just one tox task.  This could be a big win.  I feel like I'm 
>>>>>> spending
>>>>>> more time monitoring and re-queuing timed-out jenkins jobs lately than I 
>>>>>> am
>>>>>> writing code.
>>>>>>
>>>>>> On Mon, Dec 9, 2019 at 10:32 AM Udi Meiri <eh...@google.com> wrote:
>>>>>>
>>>>>>> This PR <https://github.com/apache/beam/pull/10322> (in review)
>>>>>>> migrates py27-gcp to using pytest.
>>>>>>> It reduces the testPy2Gcp task down to ~13m
>>>>>>> <https://scans.gradle.com/s/kj7ogemnd3toe/timeline?details=ancsbov425524>
>>>>>>> (from ~45m). This speedup will probably be lower once all 8 tasks are 
>>>>>>> using
>>>>>>> pytest.
>>>>>>> It also adds 5 previously uncollected tests.
>>>>>>>
>>>>>>

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to