Valentyn Tymofieiev created BEAM-14407:
------------------------------------------
Summary: Jenkins worker sometimes crashes while running postcommits
Key: BEAM-14407
URL: https://issues.apache.org/jira/browse/BEAM-14407
Project: Beam
Issue Type: Bug
Components: test-failures
Reporter: Valentyn Tymofieiev
Example failure from
https://ci-beam.apache.org/job/beam_PostCommit_Python37/5184/
```
>>> RUNNING integration tests with pipeline options: --runner=FlinkRunner
--project=apache-beam-testing --environment_type=LOOPBACK –
temp_location=gs://temp-storage-for-end-to-end-tests/temp-it
--flink_job_server_jar=/home/jenkins/jenkins-slave/workspace/
beam_PostCommit_Python37/src/runners/flink/1.14/job-server/build/libs/beam-runners-flink-1.14-job-server-2.39.0-SNAPSHOT.jar
4216 >>> pytest options: apache_beam/io/gcp/bigquery_read_it_test.py
apache_beam/io/external/xlang_jdbcio_it_test.py apache_beam/io/
external/xlang_kafkaio_it_test.py
apache_beam/io/external/xlang_kinesisio_it_test.py
apache_beam/io/external/xlang_debeziumio_it_test. py --log-cli-level=INFO
...
15:27:18 INFO apache_beam.utils.subprocess_server:subprocess_server.py:116
Starting service with ['java' '-jar'
'/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37/src/runners/flink/1.14/job-server/build/libs/beam-runners-flink-1.14-job-server-2.39.0-SNAPSHOT.jar'
'--flink-master' '[auto]' '--artifacts-dir'
'/tmp/beam-temp34uahjm8/artifactsfzc4uc4c' '--job-port' '56343'
'--artifact-port' '0' '--expansion-port' '0']
15:27:18 INFO apache_beam.utils.subprocess_server:subprocess_server.py:125
b'May 03, 2022 1:27:20 PM
software.amazon.awssdk.regions.internal.util.EC2MetadataUtils getItems'
15:27:20 INFO apache_beam.utils.subprocess_server:subprocess_server.py:125
b'WARNING: Unable to retrieve the requested metadata.'
15:27:20 INFO apache_beam.utils.subprocess_server:subprocess_server.py:125
b'May 03, 2022 1:27:20 PM
org.apache.beam.sdk.io.aws2.s3.DefaultS3ClientBuilderFactory createBuilder'
15:27:20 INFO apache_beam.utils.subprocess_server:subprocess_server.py:125
b"INFO: The AWS S3 Beam extension was included in this build, but the awsRegion
flag was not specified. If you don't plan to use S3, then ignore this message."
15:27:20 INFO apache_beam.utils.subprocess_server:subprocess_server.py:125
b'May 03, 2022 1:27:21 PM org.apache.beam.runners.jobsubmission.JobServerDriver
createArtifactStagingService'
15:27:21 INFO apache_beam.utils.subprocess_server:subprocess_server.py:125
b'INFO: ArtifactStagingService started on localhost:36631'
15:27:21 INFO apache_beam.utils.subprocess_server:subprocess_server.py:125
b'May 03, 2022 1:27:21 PM org.apache.beam.runners.jobsubmission.JobServerDriver
createExpansionService'
15:27:21 INFO apache_beam.utils.subprocess_server:subprocess_server.py:125
b'INFO: Java ExpansionService started on localhost:35729'
15:27:21 INFO apache_beam.utils.subprocess_server:subprocess_server.py:125
b'May 03, 2022 1:27:21 PM org.apache.beam.runners.jobsubmission.JobServerDriver
createJobServer'
15:27:21 INFO apache_beam.utils.subprocess_server:subprocess_server.py:125
b'INFO: JobService started on localhost:56343'
15:27:21 INFO apache_beam.utils.subprocess_server:subprocess_server.py:125
b'May 03, 2022 1:27:21 PM org.apache.beam.runners.jobsubmission.JobServerDriver
run'
15:27:21 INFO apache_beam.utils.subprocess_server:subprocess_server.py:125
b'INFO: Job server now running, terminate with Ctrl+C'
15:27:21 FATAL: command execution failed
15:27:21 java.io.IOException: Backing channel 'apache-beam-jenkins-10' is
disconnected.
15:27:21 at
hudson.remoting.RemoteInvocationHandler.channelOrFail(RemoteInvocationHandler.java:216)
...
4318 FATAL: command execution failed
4319 java.io.IOException: Backing channel 'apache-beam-jenkins-10' is
disconnected.
4320 at
hudson.remoting.RemoteInvocationHandler.channelOrFail(RemoteInvocationHandler.java:216)
4321 at
hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:286)
```
Perhaps a random crash or worker got overloaded. Other suites running at the
same time:
beam_BiqQueryIO_Streaming_Performance_Test_Java #3729
beam_LoadTests_Java_CoGBK_Dataflow_V2_Streaming_Java17 #134
beam_LoadTests_Python_GBK_Dataflow_Batch #1060
also crashed, but at the moment those tests have launched Dataflow jobs and
were streaming log output. Only the beam_PostCommit_Python37 suite appeared to
be running something intensive on the worker.
Filing to see how frequently this happens.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)