[ 
https://issues.apache.org/jira/browse/BEAM-14407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev updated BEAM-14407:
---------------------------------------
    Description: 
Example failure from 
[https://ci-beam.apache.org/job/beam_PostCommit_Python37/5184/]
{noformat}
 >>> RUNNING integration tests with pipeline options: --runner=FlinkRunner 
--project=apache-beam-testing --environment_type=LOOPBACK –      
temp_location=gs://temp-storage-for-end-to-end-tests/temp-it 
--flink_job_server_jar=/home/jenkins/jenkins-slave/workspace/                  
beam_PostCommit_Python37/src/runners/flink/1.14/job-server/build/libs/beam-runners-flink-1.14-job-server-2.39.0-SNAPSHOT.jar
4216 >>>   pytest options: apache_beam/io/gcp/bigquery_read_it_test.py 
apache_beam/io/external/xlang_jdbcio_it_test.py apache_beam/io/           
external/xlang_kafkaio_it_test.py 
apache_beam/io/external/xlang_kinesisio_it_test.py 
apache_beam/io/external/xlang_debeziumio_it_test.      py --log-cli-level=INFO

...

15:27:18 INFO     apache_beam.utils.subprocess_server:subprocess_server.py:116 
Starting service with ['java' '{-}jar' 
'/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37/src/runners/flink/1.14/job-server/build/libs/beam-runners-flink-1.14-job-server-2.39.0-SNAPSHOT.jar'
 '{-}{-}flink-master' '[auto]' '{-}{-}artifacts-dir' 
'/tmp/beam-temp34uahjm8/artifactsfzc4uc4c' '{-}{-}job-port' '56343' 
'{-}{-}artifact-port' '0' '{-}-expansion-port' '0']
15:27:18 INFO     apache_beam.utils.subprocess_server:subprocess_server.py:125 
b'May 03, 2022 1:27:20 PM 
software.amazon.awssdk.regions.internal.util.EC2MetadataUtils getItems'
15:27:20 INFO     apache_beam.utils.subprocess_server:subprocess_server.py:125 
b'WARNING: Unable to retrieve the requested metadata.'
15:27:20 INFO     apache_beam.utils.subprocess_server:subprocess_server.py:125 
b'May 03, 2022 1:27:20 PM 
org.apache.beam.sdk.io.aws2.s3.DefaultS3ClientBuilderFactory createBuilder'
15:27:20 INFO     apache_beam.utils.subprocess_server:subprocess_server.py:125 
b"INFO: The AWS S3 Beam extension was included in this build, but the awsRegion 
flag was not specified. If you don't plan to use S3, then ignore this message."
15:27:20 INFO     apache_beam.utils.subprocess_server:subprocess_server.py:125 
b'May 03, 2022 1:27:21 PM org.apache.beam.runners.jobsubmission.JobServerDriver 
createArtifactStagingService'
15:27:21 INFO     apache_beam.utils.subprocess_server:subprocess_server.py:125 
b'INFO: ArtifactStagingService started on localhost:36631'
15:27:21 INFO     apache_beam.utils.subprocess_server:subprocess_server.py:125 
b'May 03, 2022 1:27:21 PM org.apache.beam.runners.jobsubmission.JobServerDriver 
createExpansionService'
15:27:21 INFO     apache_beam.utils.subprocess_server:subprocess_server.py:125 
b'INFO: Java ExpansionService started on localhost:35729'
15:27:21 INFO     apache_beam.utils.subprocess_server:subprocess_server.py:125 
b'May 03, 2022 1:27:21 PM org.apache.beam.runners.jobsubmission.JobServerDriver 
createJobServer'
15:27:21 INFO     apache_beam.utils.subprocess_server:subprocess_server.py:125 
b'INFO: JobService started on localhost:56343'
15:27:21 INFO     apache_beam.utils.subprocess_server:subprocess_server.py:125 
b'May 03, 2022 1:27:21 PM org.apache.beam.runners.jobsubmission.JobServerDriver 
run'
15:27:21 INFO     apache_beam.utils.subprocess_server:subprocess_server.py:125 
b'INFO: Job server now running, terminate with Ctrl+C'
15:27:21 FATAL: command execution failed
15:27:21 java.io.IOException: Backing channel 'apache-beam-jenkins-10' is 
disconnected.
15:27:21     at 
hudson.remoting.RemoteInvocationHandler.channelOrFail(RemoteInvocationHandler.java:216)

...

4318 FATAL: command execution failed                                            
     
4319 java.io.IOException: Backing channel 'apache-beam-jenkins-10' is 
disconnected.  
4320   at 
hudson.remoting.RemoteInvocationHandler.channelOrFail(RemoteInvocationHandler.java:216)
                                           
4321   at 
hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:286)

 {noformat}

Perhaps a random crash or worker got overloaded. Other suites running at the 
same time:

beam_BiqQueryIO_Streaming_Performance_Test_Java #3729    
beam_LoadTests_Java_CoGBK_Dataflow_V2_Streaming_Java17 #134
beam_LoadTests_Python_GBK_Dataflow_Batch #1060

also crashed, but at the moment those tests have launched Dataflow jobs and 
were streaming log output. Only the beam_PostCommit_Python37 suite appeared to 
be running something intensive on the worker.

Filing to see how frequently this happens.

  was:
Example failure from 
https://ci-beam.apache.org/job/beam_PostCommit_Python37/5184/

```

 >>> RUNNING integration tests with pipeline options: --runner=FlinkRunner 
--project=apache-beam-testing --environment_type=LOOPBACK –      
temp_location=gs://temp-storage-for-end-to-end-tests/temp-it 
--flink_job_server_jar=/home/jenkins/jenkins-slave/workspace/                  
beam_PostCommit_Python37/src/runners/flink/1.14/job-server/build/libs/beam-runners-flink-1.14-job-server-2.39.0-SNAPSHOT.jar
4216 >>>   pytest options: apache_beam/io/gcp/bigquery_read_it_test.py 
apache_beam/io/external/xlang_jdbcio_it_test.py apache_beam/io/           
external/xlang_kafkaio_it_test.py 
apache_beam/io/external/xlang_kinesisio_it_test.py 
apache_beam/io/external/xlang_debeziumio_it_test.      py --log-cli-level=INFO

...

15:27:18 INFO     apache_beam.utils.subprocess_server:subprocess_server.py:116 
Starting service with ['java' '-jar' 
'/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37/src/runners/flink/1.14/job-server/build/libs/beam-runners-flink-1.14-job-server-2.39.0-SNAPSHOT.jar'
 '--flink-master' '[auto]' '--artifacts-dir' 
'/tmp/beam-temp34uahjm8/artifactsfzc4uc4c' '--job-port' '56343' 
'--artifact-port' '0' '--expansion-port' '0']
15:27:18 INFO     apache_beam.utils.subprocess_server:subprocess_server.py:125 
b'May 03, 2022 1:27:20 PM 
software.amazon.awssdk.regions.internal.util.EC2MetadataUtils getItems'
15:27:20 INFO     apache_beam.utils.subprocess_server:subprocess_server.py:125 
b'WARNING: Unable to retrieve the requested metadata.'
15:27:20 INFO     apache_beam.utils.subprocess_server:subprocess_server.py:125 
b'May 03, 2022 1:27:20 PM 
org.apache.beam.sdk.io.aws2.s3.DefaultS3ClientBuilderFactory createBuilder'
15:27:20 INFO     apache_beam.utils.subprocess_server:subprocess_server.py:125 
b"INFO: The AWS S3 Beam extension was included in this build, but the awsRegion 
flag was not specified. If you don't plan to use S3, then ignore this message."
15:27:20 INFO     apache_beam.utils.subprocess_server:subprocess_server.py:125 
b'May 03, 2022 1:27:21 PM org.apache.beam.runners.jobsubmission.JobServerDriver 
createArtifactStagingService'
15:27:21 INFO     apache_beam.utils.subprocess_server:subprocess_server.py:125 
b'INFO: ArtifactStagingService started on localhost:36631'
15:27:21 INFO     apache_beam.utils.subprocess_server:subprocess_server.py:125 
b'May 03, 2022 1:27:21 PM org.apache.beam.runners.jobsubmission.JobServerDriver 
createExpansionService'
15:27:21 INFO     apache_beam.utils.subprocess_server:subprocess_server.py:125 
b'INFO: Java ExpansionService started on localhost:35729'
15:27:21 INFO     apache_beam.utils.subprocess_server:subprocess_server.py:125 
b'May 03, 2022 1:27:21 PM org.apache.beam.runners.jobsubmission.JobServerDriver 
createJobServer'
15:27:21 INFO     apache_beam.utils.subprocess_server:subprocess_server.py:125 
b'INFO: JobService started on localhost:56343'
15:27:21 INFO     apache_beam.utils.subprocess_server:subprocess_server.py:125 
b'May 03, 2022 1:27:21 PM org.apache.beam.runners.jobsubmission.JobServerDriver 
run'
15:27:21 INFO     apache_beam.utils.subprocess_server:subprocess_server.py:125 
b'INFO: Job server now running, terminate with Ctrl+C'
15:27:21 FATAL: command execution failed
15:27:21 java.io.IOException: Backing channel 'apache-beam-jenkins-10' is 
disconnected.
15:27:21     at 
hudson.remoting.RemoteInvocationHandler.channelOrFail(RemoteInvocationHandler.java:216)

...

4318 FATAL: command execution failed                                            
     
4319 java.io.IOException: Backing channel 'apache-beam-jenkins-10' is 
disconnected.  
4320   at 
hudson.remoting.RemoteInvocationHandler.channelOrFail(RemoteInvocationHandler.java:216)
                                           
4321   at 
hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:286)

```

Perhaps a random crash or worker got overloaded. Other suites running at the 
same time:


beam_BiqQueryIO_Streaming_Performance_Test_Java #3729    
beam_LoadTests_Java_CoGBK_Dataflow_V2_Streaming_Java17 #134
beam_LoadTests_Python_GBK_Dataflow_Batch #1060

also crashed, but at the moment those tests have launched Dataflow jobs and 
were streaming log output. Only the beam_PostCommit_Python37 suite appeared to 
be running something intensive on the worker.

Filing to see how frequently this happens.


> Jenkins worker sometimes crashes while running postcommits
> ----------------------------------------------------------
>
>                 Key: BEAM-14407
>                 URL: https://issues.apache.org/jira/browse/BEAM-14407
>             Project: Beam
>          Issue Type: Bug
>          Components: test-failures
>            Reporter: Valentyn Tymofieiev
>            Priority: P2
>              Labels: flake
>
> Example failure from 
> [https://ci-beam.apache.org/job/beam_PostCommit_Python37/5184/]
> {noformat}
>  >>> RUNNING integration tests with pipeline options: --runner=FlinkRunner 
> --project=apache-beam-testing --environment_type=LOOPBACK –      
> temp_location=gs://temp-storage-for-end-to-end-tests/temp-it 
> --flink_job_server_jar=/home/jenkins/jenkins-slave/workspace/                 
>  
> beam_PostCommit_Python37/src/runners/flink/1.14/job-server/build/libs/beam-runners-flink-1.14-job-server-2.39.0-SNAPSHOT.jar
> 4216 >>>   pytest options: apache_beam/io/gcp/bigquery_read_it_test.py 
> apache_beam/io/external/xlang_jdbcio_it_test.py apache_beam/io/           
> external/xlang_kafkaio_it_test.py 
> apache_beam/io/external/xlang_kinesisio_it_test.py 
> apache_beam/io/external/xlang_debeziumio_it_test.      py --log-cli-level=INFO
> ...
> 15:27:18 INFO     
> apache_beam.utils.subprocess_server:subprocess_server.py:116 Starting service 
> with ['java' '{-}jar' 
> '/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37/src/runners/flink/1.14/job-server/build/libs/beam-runners-flink-1.14-job-server-2.39.0-SNAPSHOT.jar'
>  '{-}{-}flink-master' '[auto]' '{-}{-}artifacts-dir' 
> '/tmp/beam-temp34uahjm8/artifactsfzc4uc4c' '{-}{-}job-port' '56343' 
> '{-}{-}artifact-port' '0' '{-}-expansion-port' '0']
> 15:27:18 INFO     
> apache_beam.utils.subprocess_server:subprocess_server.py:125 b'May 03, 2022 
> 1:27:20 PM software.amazon.awssdk.regions.internal.util.EC2MetadataUtils 
> getItems'
> 15:27:20 INFO     
> apache_beam.utils.subprocess_server:subprocess_server.py:125 b'WARNING: 
> Unable to retrieve the requested metadata.'
> 15:27:20 INFO     
> apache_beam.utils.subprocess_server:subprocess_server.py:125 b'May 03, 2022 
> 1:27:20 PM org.apache.beam.sdk.io.aws2.s3.DefaultS3ClientBuilderFactory 
> createBuilder'
> 15:27:20 INFO     
> apache_beam.utils.subprocess_server:subprocess_server.py:125 b"INFO: The AWS 
> S3 Beam extension was included in this build, but the awsRegion flag was not 
> specified. If you don't plan to use S3, then ignore this message."
> 15:27:20 INFO     
> apache_beam.utils.subprocess_server:subprocess_server.py:125 b'May 03, 2022 
> 1:27:21 PM org.apache.beam.runners.jobsubmission.JobServerDriver 
> createArtifactStagingService'
> 15:27:21 INFO     
> apache_beam.utils.subprocess_server:subprocess_server.py:125 b'INFO: 
> ArtifactStagingService started on localhost:36631'
> 15:27:21 INFO     
> apache_beam.utils.subprocess_server:subprocess_server.py:125 b'May 03, 2022 
> 1:27:21 PM org.apache.beam.runners.jobsubmission.JobServerDriver 
> createExpansionService'
> 15:27:21 INFO     
> apache_beam.utils.subprocess_server:subprocess_server.py:125 b'INFO: Java 
> ExpansionService started on localhost:35729'
> 15:27:21 INFO     
> apache_beam.utils.subprocess_server:subprocess_server.py:125 b'May 03, 2022 
> 1:27:21 PM org.apache.beam.runners.jobsubmission.JobServerDriver 
> createJobServer'
> 15:27:21 INFO     
> apache_beam.utils.subprocess_server:subprocess_server.py:125 b'INFO: 
> JobService started on localhost:56343'
> 15:27:21 INFO     
> apache_beam.utils.subprocess_server:subprocess_server.py:125 b'May 03, 2022 
> 1:27:21 PM org.apache.beam.runners.jobsubmission.JobServerDriver run'
> 15:27:21 INFO     
> apache_beam.utils.subprocess_server:subprocess_server.py:125 b'INFO: Job 
> server now running, terminate with Ctrl+C'
> 15:27:21 FATAL: command execution failed
> 15:27:21 java.io.IOException: Backing channel 'apache-beam-jenkins-10' is 
> disconnected.
> 15:27:21     at 
> hudson.remoting.RemoteInvocationHandler.channelOrFail(RemoteInvocationHandler.java:216)
> ...
> 4318 FATAL: command execution failed                                          
>        
> 4319 java.io.IOException: Backing channel 'apache-beam-jenkins-10' is 
> disconnected.  
> 4320   at 
> hudson.remoting.RemoteInvocationHandler.channelOrFail(RemoteInvocationHandler.java:216)
>                                            
> 4321   at 
> hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:286)
>  {noformat}
> Perhaps a random crash or worker got overloaded. Other suites running at the 
> same time:
> beam_BiqQueryIO_Streaming_Performance_Test_Java #3729    
> beam_LoadTests_Java_CoGBK_Dataflow_V2_Streaming_Java17 #134
> beam_LoadTests_Python_GBK_Dataflow_Batch #1060
> also crashed, but at the moment those tests have launched Dataflow jobs and 
> were streaming log output. Only the beam_PostCommit_Python37 suite appeared 
> to be running something intensive on the worker.
> Filing to see how frequently this happens.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to