[jira] [Assigned] (BEAM-5973) [Flake] Various ValidatesRunner Post-commits flaking due to quota issues.

Sam Rohde (JIRA) Tue, 08 Jan 2019 16:59:05 -0800


     [ 
https://issues.apache.org/jira/browse/BEAM-5973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sam Rohde reassigned BEAM-5973:
-------------------------------

    Assignee: Boyuan Zhang  (was: Alan Myrvold)

> [Flake] Various ValidatesRunner Post-commits flaking due to quota issues.
> -------------------------------------------------------------------------
>
>                 Key: BEAM-5973
>                 URL: https://issues.apache.org/jira/browse/BEAM-5973
>             Project: Beam
>          Issue Type: Bug
>          Components: test-failures
>            Reporter: Daniel Oliveira
>            Assignee: Boyuan Zhang
>            Priority: Minor
>
> Multiple post-commits all seem to have failed at the same time due to 
> extremely similar GCP errors:
> beam_PostCommit_Java_GradleBuild: 
> [https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/1822/]
> Several tests fail with one of the two following errors:
> {noformat}
> Nov 04, 2018 6:40:14 PM 
> org.apache.beam.runners.dataflow.TestDataflowRunner$ErrorMonitorMessagesHandler
>  process
> INFO: Dataflow job 2018-11-04_10_37_12-7420261977214120411 threw exception. 
> Failure message was: Startup of the worker pool in zone us-central1-b failed 
> to bring up any of the desired 1 workers. QUOTA_EXCEEDED: Quota 
> 'DISKS_TOTAL_GB' exceeded. Limit: 200000.0 in region us-central1.{noformat}
> {noformat}
> Nov 04, 2018 6:39:14 PM 
> org.apache.beam.runners.dataflow.TestDataflowRunner$ErrorMonitorMessagesHandler
>  process INFO: Dataflow job 2018-11-04_10_37_11-14433481609734431843 threw 
> exception. Failure message was: Startup of the worker pool in zone 
> us-central1-b failed to bring up any of the desired 1 workers. 
> QUOTA_EXCEEDED: Quota 'CPUS' exceeded. Limit: 750.0 in region us-central1.
> {noformat}
> beam_PostCommit_Java_ValidatesRunner_PortabilityApi_Dataflow_Gradle: 
> [https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_PortabilityApi_Dataflow_Gradle/31/]
> Test failures include the errors pasted above, plus one new one:
>  
> {noformat}
> Nov 04, 2018 6:38:13 PM 
> org.apache.beam.runners.dataflow.util.MonitoringUtil$LoggingHandler process
> SEVERE: 2018-11-04T18:38:04.612Z: Workflow failed. Causes: Project 
> apache-beam-testing has insufficient quota(s) to execute this workflow with 1 
> instances in region us-central1. Quota summary (required/available): 1/7192 
> instances, 1/202 CPUs, 250/121 disk GB, 0/4046 SSD disk GB, 1/267 instance 
> groups, 1/267 managed instance groups, 1/242 instance templates, 1/446 in-use 
> IP addresses.{noformat}
>  
> beam_PostCommit_Java_PVR_Flink: 
> [https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink/214/]
> The error appears differently but is caused by a lack of memory, so it seems 
> related to the quota issues above.
>  
> {noformat}
> Java HotSpot(TM) 64-Bit Server VM warning:
> INFO: os::commit_memory(0x00000003acd80000, 6654787584, 0) failed; 
> error='Cannot allocate memory' (errno=12)
> #
> # There is insufficient memory for the Java Runtime Environment to continue.
> # Native memory allocation
> (mmap) failed to map
> 6654787584
> bytes
> for
> committing reserved memory.{noformat}
> Project 
> beam_PostCommit_Java_ValidatesRunner_Flink_Gradle:[https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/2101/]
> I couldn't find a visible error with the failure in this job, but I'm 
> grouping it together with the other failures due to it flaking at the same 
> time as the other Flink VR Post-commit.
>  
>  
> I may be grouping these failures a bit too aggressively. If anyone believes 
> that the failures are caused by different reasons please split this into 
> multiple bugs.
>  
> A possibility is that these errors are caused by us running all our 
> post-commits at the same time, causing resources to be used up in bursts. 
> Maybe if we stagger our post-commits some of these quota issues could be 
> avoided.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (BEAM-5973) [Flake] Various ValidatesRunner Post-commits flaking due to quota issues.

Reply via email to