On 19/06/2020 14:51, Stephan Bergmann wrote:
On 28/05/2020 22:19, Stephan Bergmann wrote:
For now, I have updated <https://ci.libreoffice.org/job/gerrit_linux_clang_dbgutil/> to use the new kill-wrapper timeout feature instead of Jenkins' "Abort the build if it's stuck" option.  (And am planning to roll it out to other Linux Jenkins jobs that could benefit from it, once it has proven sufficiently stable.)

I have rolled out the kill-wrapper and its timeout feature now also for <https://ci.libreoffice.org/job/gerrit_linux_clang_dbgutil_branch/>, <https://ci.libreoffice.org/job/gerrit_linux_gcc_release/>, and <https://ci.libreoffice.org/job/lo_ubsan/>.

Just to note down the semi-obvious somewhere: One scenario that kill-wrapper apparently doesn't prevent is leftover processes after Jenkins "has lost the connection" (for whatever reason, maybe a bug in Jenkins itself?).

<https://ci.libreoffice.org/job/gerrit_linux_clang_dbgutil/62736/> had gone down with

[...]
[build JUT] linguistic_unoapi
FATAL: command execution failed
java.io.EOFException
        at 
java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2738)
        at 
java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3213)
        at 
java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:896)
        at java.io.ObjectInputStream.<init>(ObjectInputStream.java:358)
        at 
hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:49)
        at hudson.remoting.Command.readFrom(Command.java:142)
        at hudson.remoting.Command.readFrom(Command.java:128)
        at 
hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:35)
        at 
hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)
Caused: java.io.IOException: Unexpected termination of the channel
        at 
hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77)
Caused: java.io.IOException: Backing channel 'tb75-lilith' is disconnected.
        at 
hudson.remoting.RemoteInvocationHandler.channelOrFail(RemoteInvocationHandler.java:216)
        at 
hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:285)
        at com.sun.proxy.$Proxy66.isAlive(Unknown Source)
        at hudson.Launcher$RemoteLauncher$ProcImpl.isAlive(Launcher.java:1147)
        at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:1139)
        at hudson.tasks.CommandInterpreter.join(CommandInterpreter.java:155)
        at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:109)
        at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:66)
        at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
        at 
hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:741)
        at hudson.model.Build$BuildExecution.build(Build.java:206)
        at hudson.model.Build$BuildExecution.doRun(Build.java:163)
        at 
hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:504)
        at hudson.model.Run.execute(Run.java:1880)
        at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
        at hudson.model.ResourceController.execute(ResourceController.java:97)
        at hudson.model.Executor.run(Executor.java:428)
FATAL: Unable to delete script file /tmp/jenkins3180341342272089625.sh
java.io.EOFException
        at 
java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2738)
        at 
java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3213)
        at 
java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:896)
        at java.io.ObjectInputStream.<init>(ObjectInputStream.java:358)
        at 
hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:49)
        at hudson.remoting.Command.readFrom(Command.java:142)
        at hudson.remoting.Command.readFrom(Command.java:128)
        at 
hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:35)
        at 
hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)
Caused: java.io.IOException: Unexpected termination of the channel
        at 
hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77)
Caused: hudson.remoting.ChannelClosedException: Channel 
"hudson.remoting.Channel@629ec1e9:tb75-lilith": Remote call on tb75-lilith 
failed. The channel is closing down or has closed down
        at hudson.remoting.Channel.call(Channel.java:991)
        at hudson.FilePath.act(FilePath.java:1069)
        at hudson.FilePath.act(FilePath.java:1058)
        at hudson.FilePath.delete(FilePath.java:1543)
        at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:123)
        at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:66)
        at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
        at 
hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:741)
        at hudson.model.Build$BuildExecution.build(Build.java:206)
        at hudson.model.Build$BuildExecution.doRun(Build.java:163)
        at 
hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:504)
        at hudson.model.Run.execute(Run.java:1880)
        at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
        at hudson.model.ResourceController.execute(ResourceController.java:97)
        at hudson.model.Executor.run(Executor.java:428)
Build step 'Execute shell' marked build as failure
Finished: FAILURE

leaving behind some pstree forest of

oosplash─┬─soffice.bin─┬─soffice.bin
         │             └─182*[{soffice.bin}]
         └─{oosplash}

sh───sh───python.bin─┬─oosplash─┬─soffice.bin─┬─soffice.bin
                     │          │             └─294*[{soffice.bin}]
                     │          └─{oosplash}
                     └─2*[{python.bin}]

sh───sh───python.bin───oosplash

sh───sh───gdb-core-bt.sh───gdb

sh───sh───python.bin───oosplash

on tb75, where each of those processes belonged to the above build as demonstrated with a respective

$ cat /proc/$PID/environ | tr '\0' '\n' | grep BUILD_NUMBER
BUILD_NUMBER=62736

That caused later builds like <https://ci.libreoffice.org/job/gerrit_linux_clang_dbgutil/62758/> on tb75 to fail with "the test UITest_calc_demo failed".

_______________________________________________
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice

Reply via email to