Thanks, Yifan.

1. It appears that there are 32 jenkins-related instances, 16 cores each,
which consume over 2/3 of available CPU quota.
2. Among old VMs there are 6 1-core VMs, that look like
"gke-io-datastores-*" and "gke-metrics-*". They don't consume much quota,
but I am curious why do we have these VMs up. Anyone has context?
3. The rest of VMs currently running seems to be test VMs started today. I
also removed a couple of stray VMs.

Yifan, I am assigning https://issues.apache.org/jira/browse/BEAM-7085 to
you since Jenkins is the biggest quota consumer right now and you are
actively working on it.

On Tue, Apr 16, 2019 at 2:09 PM Yifan Zou <[email protected]> wrote:

> We recently created 16 compute instances for the Jenkins. Each one of them
> has 16 CPUs, means they consume 256 CPU in total. I guess that is why the
> CPU usage in us-central1 remains high. We're working on the migrating the
> rest of old Jenkins agents, and the old instances will be removed once
> finish. That should relieve the pain of quota.
>
> Yifan
>
> On Tue, Apr 16, 2019 at 1:58 PM Valentyn Tymofieiev <[email protected]>
> wrote:
>
>> FYI, I have recently observed a large amount of test failures in Beam
>> test suites where Dataflow Jobs failed due to a lack of CPU quota in
>> apache-beam-testing project.
>>
>> We have been adding new suites for Python 3.x versions, which may have
>> contributed to this. problem.
>>
>> I have not investigated yet what consumes the quota yet, but the usage
>> remains high.
>>
>> Possible mitigation options:
>> - Increase quota.
>> - Decrease per-suite parallelism [1]. Currently we may  run 1-8 tests
>> from the same suite concurrently.
>> - Audit usage, perhaps kill stale jobs or VMs.
>>
>> Ideas/opinions welcome.
>>
>> I opened https://issues.apache.org/jira/browse/BEAM-7085 to track this.
>>
>> [1]
>> https://github.com/apache/beam/search?q=%22--processes%3D%22&unscoped_q=%22--processes%3D%22
>>
>

Reply via email to