| I've just ran a few regressions with that insanely huge timeout and the bad news is, the problem didn't completely go away. More, 2 different problems have emerged (I'm now not really sure if they are directly related to this issue, or I should open a new ticket. Posting everything here for now) First one: I'm now seeing a pipeline freezing AFTER all the tasks under parallel statement have completed. A restart of jenkins causes some of the steps under parallel to be rerun with the following warning:
Queue item for node block in SoC ยป RTL_REGRESSION #255 is missing (perhaps JENKINS-34281); rescheduling
But the pipeline completes. I'm also seeing runaway simulation processes that have to be killed by hand. Those kept running after the pipeline has been completed, perhaps due to a master node restart (and thus preventing further builds in that workspace). Not yet sure how I should debug this one. Second one: In an attempt to mitigate another the issue (now with old ctest on RHEL, not always handling timeouts correctly) I've added a timeout() block inside parallel, and that exposed another filesystem/timeout problem: Cancelling nested steps due to timeoutSending interrupt signal to processCancelling nested steps due to timeoutAfter 10s process did not stop java.nio.file.FileSystemException: /home/jenkins/ws/BootromSignoff/build@tmp/durable-bcad1b03/.nfs0000000029ee028d00002716: Device or resource busy at sun.nio.fs.UnixException.translateToIOException(Unknown Source) at sun.nio.fs.UnixException.rethrowAsIOException(Unknown Source) at sun.nio.fs.UnixException.rethrowAsIOException(Unknown Source) at sun.nio.fs.UnixFileSystemProvider.implDelete(Unknown Source) at sun.nio.fs.AbstractFileSystemProvider.deleteIfExists(Unknown Source) at java.nio.file.Files.deleteIfExists(Unknown Source) at hudson.Util.tryOnceDeleteFile(Util.java:316) at hudson.Util.deleteFile(Util.java:272) Also: hudson.remoting.Channel$CallSiteStackTrace: Remote call to taruca at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1741) at hudson.remoting.UserRequest$ExceptionResponse.retrieve(UserRequest.java:357) at hudson.remoting.Channel.call(Channel.java:955) at hudson.FilePath.act(FilePath.java:1070) at hudson.FilePath.act(FilePath.java:1059) at hudson.FilePath.deleteRecursive(FilePath.java:1266) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController.cleanup(FileMonitoringTask.java:340) at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution$1.run(DurableTaskStep.java:382) at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused: java.io.IOException: Unable to delete '/home/jenkins/ws/BootromSignoff/build@tmp/durable-bcad1b03/.nfs0000000029ee028d00002716'. Tried 3 times (of a maximum of 3) waiting 0.1 sec between attempts. at hudson.Util.deleteFile(Util.java:277) at hudson.FilePath.deleteRecursive(FilePath.java:1303) at hudson.FilePath.deleteContentsRecursive(FilePath.java:1312) at hudson.FilePath.deleteRecursive(FilePath.java:1302) at hudson.FilePath.access$1600(FilePath.java:211) at hudson.FilePath$DeleteRecursive.invoke(FilePath.java:1272) at hudson.FilePath$DeleteRecursive.invoke(FilePath.java:1268) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:3084) at hudson.remoting.UserRequest.perform(UserRequest.java:212) at hudson.remoting.UserRequest.perform(UserRequest.java:54) at hudson.remoting.Request$2.run(Request.java:369) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source)Sending interrupt signal to process After 10s process did not stop java.nio.file.FileSystemException: /home/jenkins/ws/BootromSignoff/build@tmp/durable-e53cb05b/.nfs0000000029ee0a9d00010597: Device or resource busy at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) at sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:244) at sun.nio.fs.AbstractFileSystemProvider.deleteIfExists(AbstractFileSystemProvider.java:108) at java.nio.file.Files.deleteIfExists(Files.java:1165) at hudson.Util.tryOnceDeleteFile(Util.java:316) at hudson.Util.deleteFile(Util.java:272) Also: hudson.remoting.Channel$CallSiteStackTrace: Remote call to oryx at hudson.remoting.Channel.attachCallSiteStackTrace(Channel.java:1741) at hudson.remoting.UserRequest$ExceptionResponse.retrieve(UserRequest.java:357) at hudson.remoting.Channel.call(Channel.java:955) at hudson.FilePath.act(FilePath.java:1070) at hudson.FilePath.act(FilePath.java:1059) at hudson.FilePath.deleteRecursive(FilePath.java:1266) at org.jenkinsci.plugins.durabletask.FileMonitoringTask$FileMonitoringController.cleanup(FileMonitoringTask.java:340) at org.jenkinsci.plugins.workflow.steps.durable_task.DurableTaskStep$Execution$1.run(DurableTaskStep.java:382) at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) Caused: java.io.IOException: Unable to delete '/home/jenkins/ws/BootromSignoff/build@tmp/durable-e53cb05b/.nfs0000000029ee0a9d00010597'. Tried 3 times (of a maximum of 3) waiting 0.1 sec between attempts. at hudson.Util.deleteFile(Util.java:277) at hudson.FilePath.deleteRecursive(FilePath.java:1303) at hudson.FilePath.deleteContentsRecursive(FilePath.java:1312) at hudson.FilePath.deleteRecursive(FilePath.java:1302) at hudson.FilePath.access$1600(FilePath.java:211) at hudson.FilePath$DeleteRecursive.invoke(FilePath.java:1272) at hudson.FilePath$DeleteRecursive.invoke(FilePath.java:1268) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:3084) at hudson.remoting.UserRequest.perform(UserRequest.java:212) at hudson.remoting.UserRequest.perform(UserRequest.java:54) at hudson.remoting.Request$2.run(Request.java:369) at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748)[Pipeline] }[Pipeline] }[Pipeline] // timeout[Pipeline] // timeout[Pipeline] echoEXCEPTION: org.jenkinsci.plugins.workflow.steps.FlowInterruptedException[Pipeline] echoCTEST BUG: Ctest didn't honor timeout setting?[Pipeline] }[Pipeline] echoEXCEPTION: org.jenkinsci.plugins.workflow.steps.FlowInterruptedException[Pipeline] echoCTEST BUG: Ctest didn't honor timeout setting?[Pipeline] }[Pipeline] // dir[Pipeline] // dir[Pipeline] }[Pipeline] }[Pipeline] // node[Pipeline] // node[Pipeline] }[Pipeline] }sh: line 1: 104849 Terminated sleep 3sh: line 1: 163732 Terminated { while [ ( -d /proc/$pid -o ! -d /proc/$$ ) -a -d '/home/jenkins/ws/BootromSignoff/build@tmp/durable-bcad1b03' -a ! -f '/home/jenkins/ws/BootromSignoff/build@tmp/durable-bcad1b03/jenkins-result.txt' ]; do touch '/home/jenkins/ws/BootromSignoff/build@tmp/durable-bcad1b03/jenkins-log.txt'; sleep 3; done; } sh: line 1: 163733 Terminated JENKINS_SERVER_COOKIE=$jsc '/home/jenkins/ws/BootromSignoff/build@tmp/durable-bcad1b03/script.sh' > '/home/jenkins/ws/BootromSignoff/build@tmp/durable-bcad1b03/jenkins-log.txt' 2>&11/1 Test #56: rumboot-default-rumboot-Production-bootrom-integration-no-selftest-host-easter-egg ...***Failed 20250.70 sec It looks like when jenkins is trying to kill off simulation it takes way more than 10 seconds (Perhaps, due to the fact that the simulator interprets the signal as a crash and starts collecting logs/core dumps that take a lot of time). I'll try to patch this timeout as well and see how it goes. |