To Jenkins users,

I am wondering what could have caused a build on slave suddenly can hit 
hudson.maven.MavenEmbedderException right after it completes checking out the 
source code from SVN repository.

01:59:14 Building remotely on <slave> in workspace <workspace>
01:59:14 Cleaning up 
/scratch/jenkins_slave_sjc-bld15-lnx/workspace/comp-wcs-PI_2_0/checkout
01:59:17 Updating https://<repo> at revision '2013-10-30T01:59:14.107 -0700'
01:59:18 U         
rfm/src/main/java/server/pojohelpers/common/ProvisioningHelperImpl.java
01:59:23 At revision 54349
01:59:24 ERROR: Failed to parse POMs
01:59:24 org.kohsuke.stapler.framework.io.IOException2: 
hudson.maven.MavenEmbedderException: error reading zip file
01:59:24 at hudson.maven.MavenVersionCallable.call(MavenVersionCallable.java:70)
01:59:24 at hudson.maven.MavenVersionCallable.call(MavenVersionCallable.java:41)
01:59:24 at hudson.remoting.UserRequest.perform(UserRequest.java:118)
01:59:24 at hudson.remoting.UserRequest.perform(UserRequest.java:48)
01:59:24 at hudson.remoting.Request$2.run(Request.java:287)
01:59:24 at 
hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72)
01:59:24 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
01:59:24 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
01:59:24 at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
01:59:24 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
01:59:24 at java.lang.Thread.run(Thread.java:619)
01:59:24 Caused by: hudson.maven.MavenEmbedderException: error reading zip file
01:59:24 at 
hudson.maven.MavenEmbedderUtils.getMavenVersion(MavenEmbedderUtils.java:203)
01:59:24 at hudson.maven.MavenVersionCallable.call(MavenVersionCallable.java:66)
01:59:24 ... 10 more
01:59:24 Caused by: java.util.zip.ZipException: error reading zip file
01:59:24 at java.util.zip.ZipFile.read(Native Method)
01:59:24 at java.util.zip.ZipFile.access$1200(ZipFile.java:29)
01:59:24 at java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:447)
01:59:24 at java.util.zip.ZipFile$1.fill(ZipFile.java:230)
01:59:24 at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:141)
01:59:24 at java.io.FilterInputStream.read(FilterInputStream.java:116)
01:59:24 at java.io.FilterInputStream.read(FilterInputStream.java:90)
01:59:24 at java.util.Properties$LineReader.readLine(Properties.java:418)
01:59:24 at java.util.Properties.load0(Properties.java:337)
01:59:24 at java.util.Properties.load(Properties.java:325)
01:59:24 at 
hudson.maven.MavenEmbedderUtils.getMavenVersion(MavenEmbedderUtils.java:200)
01:59:24 ... 11 more

We suddenly faced this issue  on multiple Linux VM nodes/slaves on various 
Jenkins master instances all on the same day.
The Jenkins we are running are version 1.396 (2 instances), 1.455 (current 
version adopted)  and 1.509.3 (1 instance)  Each has at least 2 to 10 Linux 
slaves.  Each master minimum has 50 – 100 jobs.
We are using "Launch slave via execution of command on the Master".  We have a 
Shell script executed via SSH like for instance: "ssh <slave-hostname> 
/scratch/jenkins_slave_sjc-bld15-lnx/launch-slave-lnx"

1 Jenkins that faced this issue were occupying the load & using all the memory 
while others were not using the load or memory i.e. not a lot of build were 
running.
The common denominator is that the number of jobs are big that runs regularly 
on commit poll or nightly schedule and the Jenkins are not regularly restarted.
The spec of all masters is RHEL5.6, 4 cores, and 16GB RAM.
We use default Winstone JVM container that comes with Jenkins.

The SSH plugin installed on the masters:
On old version 1.396 : Plugin Name: ssh-slaves Plugin Version: 0.14
On version 1.455: Plugin Name: ssh-slaves Plugin Version: 0.21
On version 1.509.3: Plugin Name: ssh-slaves Plugin Version: 0.25

The launch-slave-lnx runs:
#!/bin/bash

workdir=`dirname "$0"`
cihost=`basename $workdir | cut -d_ -f3`.cisco.com
JAVA_HOME=/auto/cwtools/java/jdk1.6.0_34/LNX-64
PATH=$JAVA_HOME/bin:/usr/cisco/bin:/opt/groovy/bin:$PATH
export PATH
wget -O $workdir/slave.jar https://$cihost:9081/jnlpJars/slave.jar 
--no-check-certificate
java -Xms2048m -Xmx2048m -jar $workdir/slave.jar

The workaround to solve the issue is to disconnect and reconnect slave.  We 
still do face issue where build sometime do not progress even before checking 
out the code.  Again, we disconnect and reconnect slave to solve that too.


I conclude that:
This is not a specific version of Jenkins related issue since it occurred with 
different version.
It could be ssh-slave issue or simply Jenkins has memory leak where master is 
unable to use the existing ssh process running on it that have been connecting 
to the slave due to memory leak or network issues.
Is doing SSH with "Launch slave via execution of command on the Master" not 
reliable?  Should we simply use "Launch slave agents on Unix machines via SSH"?
We do not do that because we keep the JDKs on the network filer location so we 
refer it as JAVA_HOME in shell script.

I am not sure why the maven*-agent.jar already in the workspace that were 
extracted from last slave.jar downloaded from master become corrupted?

Your comment or insight is greatly appreciated.

Thank you.
-Indra






-- 
You received this message because you are subscribed to the Google Groups 
"Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to