Adding Evgeny and Shirly who are AFAIK the owners of the metrics suit. On Sun, 1 Sep 2019 at 17:07, Barak Korren <[email protected]> wrote:
> If you have been using or monitoring any OST suits recently, you may have > noticed we've been suffering from long delays in allocating CI hardware > resources for running OST suits. I'd like to briefly discuss the reasons > behind this, what are planning to do to resolve this and the implication of > those actions for big suit owners. > > As you might know, we have moved a while ago from running OST suits each > on its own dedicated server to running them inside containers managed by > OpenShift. That had allowed us to run multiple OST suits on the same > bare-metal host which in turn increased our overall capacity by 50% while > still allowing us to free up hardware for accommodating the kubevirt > project on our CI hardware. > > Our infrastructure is currently built in a way where we use the exact same > POD specification (and therefore resource settings) for all suits. Making > it more flexible at this point would require significant code changes we > are not likely to make. What this means is that we need to make sure our > PODs have enough resources to run the most demanding suits. It also means > we waste some resources when running less demanding ones. > > Given the set of OST suits we have ATM, we sized our PODs to allocate > 32Gibs of RAM. Given the servers we have, this means we can run 15 suits at > a time in parallel. This was sufficient for a while, but given increasing > demand, and the expectation for it to increase further once we introduce > the patch gating features we've been working on, we must find a way to > significantly increase our suit running capacity. > > We have measured the amount of RAM required by each suit and came to the > conclusion that for the vast majority of suits, we could settle for PODs > that allocate only 14Gibs of RAM. If we make that change, we would be able > to run a total of 40 suits at a time, almost tripling our current capacity. > > The downside of making this change is that our STDCI V2 infrastructure > will no longer be able to run suits that require more then 14Gib of RAM. > This effectively means it would no longer be possible to run these suits > from OST's check-patch job or from the OST manual job. > > The list of relevant suits that would be affected follows, the suit > owners, as documented in the CI configuration, have be added as "to" > recipients to the message: > > - hc-basic-suite-4.3 > - hc-basic-suite-master > - metrics-suite-4.3 > > Since we're aware people would still like to be able to work with the > bigger suits, we will leverage the nightly suit invocation jobs to enable > then to be run in the CI infra. We will support the following use cases: > > - *Periodically running the suit on the latest oVirt packages* - this > will be done by the nightly job like it is done today > - *Running the suit to test changes to the suit`s code* - while > currently this is done automatically by check-patch, this would have to be > done manually in the future by manually triggering the nightly job and > setting the REFSPEC parameter to point to the examined patch > - *Triggering the suit manually* - This would be done by triggering > the suit-specific nightly job (as opposed to the general OST manual job) > > The patches listed below implement the changes outlined above: > > - 102757 <https://gerrit.ovirt.org/102757> nightly-system-tests: big > suits -> big containers > - 102771 <https://gerrit.ovirt.org/102771>: stdci: Drop `big` suits > from check-patch > > We know that making the changes we presented will make things a little > less convenient for users and maintainers of the big suits, but we believe > the benefits of having vastly increased execution capacity for all other > suits outweigh those shortcomings. > > We would like to hear all relevant comment and questions from the quite > owners and other interested parties, especially is you think we should not > carry out the changes we propose. > Please take the time to respond on this thread, or on the linked patches. > > Thanks, > > -- > Barak Korren > RHV DevOps team , RHCE, RHCi > Red Hat EMEA > redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted > -- Barak Korren RHV DevOps team , RHCE, RHCi Red Hat EMEA redhat.com | TRIED. TESTED. TRUSTED. | redhat.com/trusted
_______________________________________________ Devel mailing list -- [email protected] To unsubscribe send an email to [email protected] Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/[email protected]/message/OURCWMCLA5KU36S5FJUG75KWJA3QAKLU/
