On 03/02/2021 00:03, Anil Belur wrote:
> Greetings Robert:

Hello Anil,

> lf-env.sh: Creates a virtual env and sets up the environment, while the
> python-tools-install.sh Installs the python tools/utils during Job
> runtime. Since releng/global-jjb is a repo of Generic JJB templates (can
> be used by any of the CI management repositories), its up to the
> $project/$job to install the dependencies required for running the job.  

Understood. At the end of the day, though, we have only a few classes of
jobs and there is a ton of commonalities between them.

> We have discussed this in the past, installing PyPI dependencies during
> packer image build time, comes with its own set of problems and added costs:
> 1. This requires maintaining a large number of packer images (if the
> project needs to support multiple versions of python/PyPI deps).

I do not believe this is the case for OpenDaylight jobs. For example
each and every job I looked at performs two things:
- python-tools-install.sh (70 seconds)
- job-cost.sh (39 seconds)

> 2. All releng/global-jjb (templates) scripts do not require all of the
> PyPi dependencies to be installed and are tied down to the $job or
> $project, since this approach binding them all into the same env has a
> risk of the deps being broken more frequently.
> 3. PyPi libs/modules are updated more frequently. 

While that is true, this line of reasoning completely ignores the
failure mode and recovery.

As it stands any of:
- busted global-jjb
- PyPi package updates
- PyPi repository unavailability

As we have seen in these past weeks, any such failure immediately
propagates to all jobs and breaks them -- resulting in nothing working
anymore, with no real avenue for recovery without help of LF IT.

We actually went through exactly this discussion when we had Sigul
failures -- and Sigul is now part of base images.

It is deemed sufficient to update our cloud images once a month -- and
that includes all sorts security fixes and similar. As a community we
are free to decide when to spin new images and can do that completely
without LF IT intervention.

I am sorry, but I fail to see how Python packages special enough to inflict:
- breakages occurring at completely random times
- incur 2-5 minutes of infra install to *each and every job* we run[*]

I am sorry to say that the world has changed in the past 5 years and we
no longer have the attention of LF IT staff that made resolution of
these failures a matter of hours -- it really is multiple days. That
fact alone makes a huge difference when weighing pros and cons.

Regards,
Robert

[*]
Just take a good look at what
https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/aaa-maven-verify-master-mvn35-openjdk11/3/console-timestamp.log.gz
did:

Total job runtime:   9m56s
Useful build time:   7m16s
Setup/teardown time: 2m40s

That's **27%** of the time spent on infra, amounting to **37%** overhead.

> 
> Thanks,
> Anil
> 
> 
> On Mon, Jan 25, 2021 at 7:44 PM Robert Varga <[email protected]
> <mailto:[email protected]>> wrote:
> 
>     Hello everyone,
> 
>     as the (still current) failure to start Jenkins jobs shows, our current
>     way of integrating with external dependencies (global-jjb) is beyond
>     fragile.
> 
>     The way our jobs work is that:
> 
>     1) we have a base image, created by builder-packer-* jobs on a regular
>     basis and roll up distro upgrades plus some other things (like mininet,
>     etc.) that we need
> 
>     2) the Jenkins job launches on that base image and call two scripts from
>     global-jjb, both of which end up installing more things:
>        a) python-tools-install.sh
>        b) lf-env.sh
> 
>     3) the actual job runs
> 
>     4) some more stuff invoking lf-env.sh to setup another Python
>     environment runs.
> 
>     Now, it is clear that everything in 1) is invariant and updated in a
>     controlled way.
> 
>     The problem is with 2), where again, everything is supposed to be
>     invariant for a particular version of global-jjb -- yet we reinstall
>     these things on every single job run.
> 
>     Not only is this subject to random breakage (like now, or when pip
>     repositories are unavailable), etc.
> 
>     It also takes around 3 minutes of each job execution, which does not
>     sound like much, but it is full 30%(!) of runtime of
>     yangtools-release-merge (which takes around 10 minutes).
> 
>     We obviously can and must do better: global-jjb's environment-impacting
>     scripts must all be executed during builder-packer, so that they become
>     proper invariants.
> 
>     For that, global-jjb needs to grow two things:
> 
>     1) a way to install *all* of its dependencies without doing anything
>     else, for use in packer jobs
> 
>     2) compatibility checks on the environment to ensure it is uptodate
>     enough to run a particular global-jjb version's scripts
> 
>     With that, our jobs should be both faster and more reliable.
> 
>     Does anybody see a problem why this would not work?
> 
>     If not, I will be filing LFIT issues to get this done.
> 
>     Regards,
>     Robert
> 
> 
>     
> 

Attachment: OpenPGP_signature
Description: OpenPGP digital signature

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#1196): 
https://lists.opendaylight.org/g/infrastructure/message/1196
Mute This Topic: https://lists.opendaylight.org/mt/80340693/21656
Group Owner: [email protected]
Unsubscribe: https://lists.opendaylight.org/g/infrastructure/unsub 
[[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to