| Sometimes when AWS terminates one of our spot instances, it gets into a state where Jenkins still thinks its a valid, available executor, but it is in the process of shutting down and therefore cannot fulfill any requests. When this occurs, our entire backlog of tests rapidly flushes through that executor, failing all of them. Sometimes the executor is totally broken like so:
00:00:00.002 Started by remote host 140.211.10.27
00:00:00.002 [EnvInject] - Loading node environment variables.
00:00:27.357 FATAL: java.io.IOException: Unexpected termination of the channel
00:00:27.358 hudson.remoting.RequestAbortedException: java.io.IOException: Unexpected termination of the channel
00:00:27.359 at hudson.remoting.Request.abort(Request.java:303)
00:00:27.360 at hudson.remoting.Channel.terminate(Channel.java:863)
00:00:27.360 at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:92)
00:00:27.360 at ......remote call to Testrunner (sir-tdd89gzm)(Native Method)
00:00:27.361 at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1433)
00:00:27.361 at hudson.remoting.Request.call(Request.java:172)
00:00:27.361 at hudson.remoting.Channel.call(Channel.java:796)
00:00:27.362 at hudson.FilePath.act(FilePath.java:1102)
00:00:27.362 at org.jenkinsci.plugins.envinject.service.EnvironmentVariablesNodeLoader.gatherEnvironmentVariablesNode(EnvironmentVariablesNodeLoader.java:48)
00:00:27.363 at org.jenkinsci.plugins.envinject.EnvInjectListener.loadEnvironmentVariablesNode(EnvInjectListener.java:80)
00:00:27.363 at org.jenkinsci.plugins.envinject.EnvInjectListener.setUpEnvironment(EnvInjectListener.java:42)
00:00:27.364 at hudson.model.AbstractBuild$AbstractBuildExecution.createLauncher(AbstractBuild.java:572)
00:00:27.364 at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:492)
00:00:27.365 at hudson.model.Run.execute(Run.java:1720)
00:00:27.365 at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
00:00:27.365 at hudson.model.ResourceController.execute(ResourceController.java:98)
00:00:27.365 at hudson.model.Executor.run(Executor.java:404)
00:00:27.366 Caused by: java.io.IOException: Unexpected termination of the channel
00:00:27.366 at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:73)
00:00:27.367 Caused by: java.io.EOFException
00:00:27.367 at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2335)
00:00:27.367 at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2804)
00:00:27.368 at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:802)
00:00:27.368 at java.io.ObjectInputStream.<init>(ObjectInputStream.java:299)
00:00:27.368 at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:48)
00:00:27.369 at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
00:00:27.369 at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:59)
00:00:27.370 ERROR: Step ‘Publish Checkstyle analysis results’ failed: no workspace for drupal_patches #8845
00:00:27.371 ERROR: Step ‘Archive the artifacts’ failed: no workspace for drupal_patches #8845
00:00:27.372 Checking console output
00:00:27.373 ERROR: Step ‘Publish JUnit test result report’ failed: no workspace for drupal_patches #8845
00:00:27.396 Finished: FAILURE
Other times it seems like the instance is shutting down, which kills the docker daemon before it kills the jenkins executor availability:
00:00:00.001 Started by remote host 140.211.10.27
00:00:00.001 [EnvInject] - Loading node environment variables.
00:00:00.007 Building remotely on Testrunner (sir-yjpgbk4n) (testrunner) in workspace /var/lib/drupalci/workspace
00:00:00.018 [workspace] $ /bin/bash /tmp/hudson6328886945950906962.sh
00:00:00.124 Cannot connect to the Docker daemon. Is the docker daemon running on this host?
00:00:00.141 Cannot connect to the Docker daemon. Is the docker daemon running on this host?
00:00:00.151 ++ id
00:00:00.152 uid=1001(testbot) gid=1001(testbot) groups=1001(testbot),27(sudo),999(docker)
00:00:00.153 ++ export COMPOSER_CACHE_DIR=/opt/drupalci/composer-cache
00:00:00.153 ++ COMPOSER_CACHE_DIR=/opt/drupalci/composer-cache
00:00:00.153 ++ echo https:00:00:00.153 https:00:00:00.154 ++ curl -w '\n' -s http:00:00:00.159 cc2.8xlarge
00:00:00.159 ++ curl -w '\n' -s http:00:00:00.165 ami-3c42c35c
00:00:00.166 ++ curl -w '\n' -s http:00:00:00.172 54.212.244.41
00:00:00.172 ++ env
00:00:00.172 ++ grep DCI
00:00:00.173 DCI_CS_CoderVersion=8.2.8
00:00:00.173 DCI_PHPVersion=php-5.6-apache:production
00:00:00.173 DCI_JobType=simpletest
00:00:00.174 DCI_CoreBranch=8.4.x
00:00:00.174 DCI_Patch=rpc_endpoint_to_reset-2847708-24.patch,.
00:00:00.174 DCI_Debug=FALSE
00:00:00.174 DCI_ES_LintFailsTest=TRUE
00:00:00.174 DCI_Fetch=https:00:00:00.175 DCI_Concurrency=31
00:00:00.175 DCI_CoreRepository=git:00:00:00.175 DCI_DBVersion=mysql-5.5
00:00:00.175 ++ env
00:00:00.175 ++ grep -v DCI
00:00:00.175 BUILD_URL=http:00:00:00.176 SHELL=/bin/bash
00:00:00.176 HUDSON_SERVER_COOKIE=f9f94f9baaa33b04
00:00:00.176 SSH_CLIENT=172.31.42.62 35896 22
00:00:00.176 BUILD_TAG=jenkins-drupal_patches-8786
00:00:00.177 ROOT_BUILD_CAUSE=REMOTECAUSE
00:00:00.177 JOB_URL=http:00:00:00.177 WORKSPACE=/var/lib/drupalci/workspace
00:00:00.177 USER=testbot
00:00:00.177 ROOT_BUILD_CAUSE_REMOTECAUSE=true
00:00:00.178 COMPOSER_CACHE_DIR=/opt/drupalci/composer-cache
00:00:00.178 JENKINS_HOME=/usr/local/jenkins
00:00:00.178 MAIL=/var/mail/testbot
00:00:00.178 PATH=/usr/local/bin:/usr/bin:/bin:/usr/games
00:00:00.178 PWD=/var/lib/drupalci/workspace
00:00:00.178 HUDSON_URL=http:00:00:00.179 LANG=en_US.UTF-8
00:00:00.179 JOB_NAME=drupal_patches
00:00:00.179 BUILD_CAUSE_REMOTECAUSE=true
00:00:00.179 BUILD_DISPLAY_NAME=#8786
00:00:00.179 BUILD_ID=8786
00:00:00.179 BUILD_CAUSE=REMOTECAUSE
00:00:00.179 JENKINS_URL=http:00:00:00.180 Drupal_JobID=https:00:00:00.180 JOB_BASE_NAME=drupal_patches
00:00:00.180 SHLVL=3
00:00:00.180 HOME=/home/testbot
00:00:00.180 EXECUTOR_NUMBER=0
00:00:00.180 JENKINS_SERVER_COOKIE=f9f94f9baaa33b04
00:00:00.181 NODE_LABELS=Testrunner (sir-yjpgbk4n) testrunner
00:00:00.181 LOGNAME=testbot
00:00:00.181 SSH_CONNECTION=172.31.42.62 35896 172.31.0.168 22
00:00:00.181 HUDSON_HOME=/usr/local/jenkins
00:00:00.181 NODE_NAME=Testrunner (sir-yjpgbk4n)
00:00:00.181 BUILD_NUMBER=8786
00:00:00.182 Testrunner_Branch=production
00:00:00.182 HUDSON_COOKIE=90c8c0be-3081-4c6c-a8c6-e79cc8e16f5c
00:00:00.182 _=/usr/bin/env
00:00:00.182 ++ cd /opt/drupalci/testrunner
00:00:00.182 ++ git fetch --all --tags
00:00:00.183 Fetching origin
00:00:00.275 ++ git checkout production
00:00:00.278 Already on 'production'
00:00:00.278 Your branch is up-to-date with 'origin/production'.
00:00:00.278 ++ git pull --rebase
00:00:00.377 Current branch production is up to date.
00:00:00.379 ++ docker pull drupalci/php-5.6-apache:production
00:00:00.387 Warning: failed to get default registry endpoint from daemon (Cannot connect to the Docker daemon. Is the docker daemon running on this host?). Using system default: https:00:00:00.388 Cannot connect to the Docker daemon. Is the docker daemon running on this host?
00:00:00.394 Build step 'Execute shell' marked build as failure
00:00:00.458 [CHECKSTYLE] Collecting checkstyle analysis files...
00:00:00.501 [CHECKSTYLE] Finding all files that match the pattern jenkins-drupal_patches-8786/artifacts/*/checkstyle.xml
00:00:00.504 [CHECKSTYLE] Computing warning deltas based on reference build #8777
00:00:00.504 Archiving artifacts
00:00:00.507 Checking console output
00:00:00.507 Recording test results
00:00:00.510 ERROR: Step ‘Publish JUnit test result report’ failed: No test report files were found. Configuration error?
00:00:00.533 Finished: FAILURE
|