[
https://issues.apache.org/jira/browse/BEAM-5973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sam Rohde reassigned BEAM-5973:
-------------------------------
Assignee: Boyuan Zhang (was: Alan Myrvold)
> [Flake] Various ValidatesRunner Post-commits flaking due to quota issues.
> -------------------------------------------------------------------------
>
> Key: BEAM-5973
> URL: https://issues.apache.org/jira/browse/BEAM-5973
> Project: Beam
> Issue Type: Bug
> Components: test-failures
> Reporter: Daniel Oliveira
> Assignee: Boyuan Zhang
> Priority: Minor
>
> Multiple post-commits all seem to have failed at the same time due to
> extremely similar GCP errors:
> beam_PostCommit_Java_GradleBuild:
> [https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/1822/]
> Several tests fail with one of the two following errors:
> {noformat}
> Nov 04, 2018 6:40:14 PM
> org.apache.beam.runners.dataflow.TestDataflowRunner$ErrorMonitorMessagesHandler
> process
> INFO: Dataflow job 2018-11-04_10_37_12-7420261977214120411 threw exception.
> Failure message was: Startup of the worker pool in zone us-central1-b failed
> to bring up any of the desired 1 workers. QUOTA_EXCEEDED: Quota
> 'DISKS_TOTAL_GB' exceeded. Limit: 200000.0 in region us-central1.{noformat}
> {noformat}
> Nov 04, 2018 6:39:14 PM
> org.apache.beam.runners.dataflow.TestDataflowRunner$ErrorMonitorMessagesHandler
> process INFO: Dataflow job 2018-11-04_10_37_11-14433481609734431843 threw
> exception. Failure message was: Startup of the worker pool in zone
> us-central1-b failed to bring up any of the desired 1 workers.
> QUOTA_EXCEEDED: Quota 'CPUS' exceeded. Limit: 750.0 in region us-central1.
> {noformat}
> beam_PostCommit_Java_ValidatesRunner_PortabilityApi_Dataflow_Gradle:
> [https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_PortabilityApi_Dataflow_Gradle/31/]
> Test failures include the errors pasted above, plus one new one:
>
> {noformat}
> Nov 04, 2018 6:38:13 PM
> org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
> SEVERE: 2018-11-04T18:38:04.612Z: Workflow failed. Causes: Project
> apache-beam-testing has insufficient quota(s) to execute this workflow with 1
> instances in region us-central1. Quota summary (required/available): 1/7192
> instances, 1/202 CPUs, 250/121 disk GB, 0/4046 SSD disk GB, 1/267 instance
> groups, 1/267 managed instance groups, 1/242 instance templates, 1/446 in-use
> IP addresses.{noformat}
>
> beam_PostCommit_Java_PVR_Flink:
> [https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/214/]
> The error appears differently but is caused by a lack of memory, so it seems
> related to the quota issues above.
>
> {noformat}
> Java HotSpot(TM) 64-Bit Server VM warning:
> INFO: os::commit_memory(0x00000003acd80000, 6654787584, 0) failed;
> error='Cannot allocate memory' (errno=12)
> #
> # There is insufficient memory for the Java Runtime Environment to continue.
> # Native memory allocation
> (mmap) failed to map
> 6654787584
> bytes
> for
> committing reserved memory.{noformat}
> Project
> beam_PostCommit_Java_ValidatesRunner_Flink_Gradle:[https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/2101/]
> I couldn't find a visible error with the failure in this job, but I'm
> grouping it together with the other failures due to it flaking at the same
> time as the other Flink VR Post-commit.
>
>
> I may be grouping these failures a bit too aggressively. If anyone believes
> that the failures are caused by different reasons please split this into
> multiple bugs.
>
> A possibility is that these errors are caused by us running all our
> post-commits at the same time, causing resources to be used up in bursts.
> Maybe if we stagger our post-commits some of these quota issues could be
> avoided.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)