Thanks, Yifan. 1. It appears that there are 32 jenkins-related instances, 16 cores each, which consume over 2/3 of available CPU quota. 2. Among old VMs there are 6 1-core VMs, that look like "gke-io-datastores-*" and "gke-metrics-*". They don't consume much quota, but I am curious why do we have these VMs up. Anyone has context? 3. The rest of VMs currently running seems to be test VMs started today. I also removed a couple of stray VMs.
Yifan, I am assigning https://issues.apache.org/jira/browse/BEAM-7085 to you since Jenkins is the biggest quota consumer right now and you are actively working on it. On Tue, Apr 16, 2019 at 2:09 PM Yifan Zou <[email protected]> wrote: > We recently created 16 compute instances for the Jenkins. Each one of them > has 16 CPUs, means they consume 256 CPU in total. I guess that is why the > CPU usage in us-central1 remains high. We're working on the migrating the > rest of old Jenkins agents, and the old instances will be removed once > finish. That should relieve the pain of quota. > > Yifan > > On Tue, Apr 16, 2019 at 1:58 PM Valentyn Tymofieiev <[email protected]> > wrote: > >> FYI, I have recently observed a large amount of test failures in Beam >> test suites where Dataflow Jobs failed due to a lack of CPU quota in >> apache-beam-testing project. >> >> We have been adding new suites for Python 3.x versions, which may have >> contributed to this. problem. >> >> I have not investigated yet what consumes the quota yet, but the usage >> remains high. >> >> Possible mitigation options: >> - Increase quota. >> - Decrease per-suite parallelism [1]. Currently we may run 1-8 tests >> from the same suite concurrently. >> - Audit usage, perhaps kill stale jobs or VMs. >> >> Ideas/opinions welcome. >> >> I opened https://issues.apache.org/jira/browse/BEAM-7085 to track this. >> >> [1] >> https://github.com/apache/beam/search?q=%22--processes%3D%22&unscoped_q=%22--processes%3D%22 >> >
