On 03/02/2021 00:03, Anil Belur wrote: > Greetings Robert: Hello Anil,
> lf-env.sh: Creates a virtual env and sets up the environment, while the > python-tools-install.sh Installs the python tools/utils during Job > runtime. Since releng/global-jjb is a repo of Generic JJB templates (can > be used by any of the CI management repositories), its up to the > $project/$job to install the dependencies required for running the job. Understood. At the end of the day, though, we have only a few classes of jobs and there is a ton of commonalities between them. > We have discussed this in the past, installing PyPI dependencies during > packer image build time, comes with its own set of problems and added costs: > 1. This requires maintaining a large number of packer images (if the > project needs to support multiple versions of python/PyPI deps). I do not believe this is the case for OpenDaylight jobs. For example each and every job I looked at performs two things: - python-tools-install.sh (70 seconds) - job-cost.sh (39 seconds) > 2. All releng/global-jjb (templates) scripts do not require all of the > PyPi dependencies to be installed and are tied down to the $job or > $project, since this approach binding them all into the same env has a > risk of the deps being broken more frequently. > 3. PyPi libs/modules are updated more frequently. While that is true, this line of reasoning completely ignores the failure mode and recovery. As it stands any of: - busted global-jjb - PyPi package updates - PyPi repository unavailability As we have seen in these past weeks, any such failure immediately propagates to all jobs and breaks them -- resulting in nothing working anymore, with no real avenue for recovery without help of LF IT. We actually went through exactly this discussion when we had Sigul failures -- and Sigul is now part of base images. It is deemed sufficient to update our cloud images once a month -- and that includes all sorts security fixes and similar. As a community we are free to decide when to spin new images and can do that completely without LF IT intervention. I am sorry, but I fail to see how Python packages special enough to inflict: - breakages occurring at completely random times - incur 2-5 minutes of infra install to *each and every job* we run[*] I am sorry to say that the world has changed in the past 5 years and we no longer have the attention of LF IT staff that made resolution of these failures a matter of hours -- it really is multiple days. That fact alone makes a huge difference when weighing pros and cons. Regards, Robert [*] Just take a good look at what https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1/aaa-maven-verify-master-mvn35-openjdk11/3/console-timestamp.log.gz did: Total job runtime: 9m56s Useful build time: 7m16s Setup/teardown time: 2m40s That's **27%** of the time spent on infra, amounting to **37%** overhead. > > Thanks, > Anil > > > On Mon, Jan 25, 2021 at 7:44 PM Robert Varga <[email protected] > <mailto:[email protected]>> wrote: > > Hello everyone, > > as the (still current) failure to start Jenkins jobs shows, our current > way of integrating with external dependencies (global-jjb) is beyond > fragile. > > The way our jobs work is that: > > 1) we have a base image, created by builder-packer-* jobs on a regular > basis and roll up distro upgrades plus some other things (like mininet, > etc.) that we need > > 2) the Jenkins job launches on that base image and call two scripts from > global-jjb, both of which end up installing more things: > a) python-tools-install.sh > b) lf-env.sh > > 3) the actual job runs > > 4) some more stuff invoking lf-env.sh to setup another Python > environment runs. > > Now, it is clear that everything in 1) is invariant and updated in a > controlled way. > > The problem is with 2), where again, everything is supposed to be > invariant for a particular version of global-jjb -- yet we reinstall > these things on every single job run. > > Not only is this subject to random breakage (like now, or when pip > repositories are unavailable), etc. > > It also takes around 3 minutes of each job execution, which does not > sound like much, but it is full 30%(!) of runtime of > yangtools-release-merge (which takes around 10 minutes). > > We obviously can and must do better: global-jjb's environment-impacting > scripts must all be executed during builder-packer, so that they become > proper invariants. > > For that, global-jjb needs to grow two things: > > 1) a way to install *all* of its dependencies without doing anything > else, for use in packer jobs > > 2) compatibility checks on the environment to ensure it is uptodate > enough to run a particular global-jjb version's scripts > > With that, our jobs should be both faster and more reliable. > > Does anybody see a problem why this would not work? > > If not, I will be filing LFIT issues to get this done. > > Regards, > Robert > > > >
OpenPGP_signature
Description: OpenPGP digital signature
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#1196): https://lists.opendaylight.org/g/infrastructure/message/1196 Mute This Topic: https://lists.opendaylight.org/mt/80340693/21656 Group Owner: [email protected] Unsubscribe: https://lists.opendaylight.org/g/infrastructure/unsub [[email protected]] -=-=-=-=-=-=-=-=-=-=-=-
