To Jenkins users, I am wondering what could have caused a build on slave suddenly can hit hudson.maven.MavenEmbedderException right after it completes checking out the source code from SVN repository.
01:59:14 Building remotely on <slave> in workspace <workspace> 01:59:14 Cleaning up /scratch/jenkins_slave_sjc-bld15-lnx/workspace/comp-wcs-PI_2_0/checkout 01:59:17 Updating https://<repo> at revision '2013-10-30T01:59:14.107 -0700' 01:59:18 U rfm/src/main/java/server/pojohelpers/common/ProvisioningHelperImpl.java 01:59:23 At revision 54349 01:59:24 ERROR: Failed to parse POMs 01:59:24 org.kohsuke.stapler.framework.io.IOException2: hudson.maven.MavenEmbedderException: error reading zip file 01:59:24 at hudson.maven.MavenVersionCallable.call(MavenVersionCallable.java:70) 01:59:24 at hudson.maven.MavenVersionCallable.call(MavenVersionCallable.java:41) 01:59:24 at hudson.remoting.UserRequest.perform(UserRequest.java:118) 01:59:24 at hudson.remoting.UserRequest.perform(UserRequest.java:48) 01:59:24 at hudson.remoting.Request$2.run(Request.java:287) 01:59:24 at hudson.remoting.InterceptingExecutorService$1.call(InterceptingExecutorService.java:72) 01:59:24 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) 01:59:24 at java.util.concurrent.FutureTask.run(FutureTask.java:138) 01:59:24 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) 01:59:24 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) 01:59:24 at java.lang.Thread.run(Thread.java:619) 01:59:24 Caused by: hudson.maven.MavenEmbedderException: error reading zip file 01:59:24 at hudson.maven.MavenEmbedderUtils.getMavenVersion(MavenEmbedderUtils.java:203) 01:59:24 at hudson.maven.MavenVersionCallable.call(MavenVersionCallable.java:66) 01:59:24 ... 10 more 01:59:24 Caused by: java.util.zip.ZipException: error reading zip file 01:59:24 at java.util.zip.ZipFile.read(Native Method) 01:59:24 at java.util.zip.ZipFile.access$1200(ZipFile.java:29) 01:59:24 at java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:447) 01:59:24 at java.util.zip.ZipFile$1.fill(ZipFile.java:230) 01:59:24 at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:141) 01:59:24 at java.io.FilterInputStream.read(FilterInputStream.java:116) 01:59:24 at java.io.FilterInputStream.read(FilterInputStream.java:90) 01:59:24 at java.util.Properties$LineReader.readLine(Properties.java:418) 01:59:24 at java.util.Properties.load0(Properties.java:337) 01:59:24 at java.util.Properties.load(Properties.java:325) 01:59:24 at hudson.maven.MavenEmbedderUtils.getMavenVersion(MavenEmbedderUtils.java:200) 01:59:24 ... 11 more We suddenly faced this issue on multiple Linux VM nodes/slaves on various Jenkins master instances all on the same day. The Jenkins we are running are version 1.396 (2 instances), 1.455 (current version adopted) and 1.509.3 (1 instance) Each has at least 2 to 10 Linux slaves. Each master minimum has 50 – 100 jobs. We are using "Launch slave via execution of command on the Master". We have a Shell script executed via SSH like for instance: "ssh <slave-hostname> /scratch/jenkins_slave_sjc-bld15-lnx/launch-slave-lnx" 1 Jenkins that faced this issue were occupying the load & using all the memory while others were not using the load or memory i.e. not a lot of build were running. The common denominator is that the number of jobs are big that runs regularly on commit poll or nightly schedule and the Jenkins are not regularly restarted. The spec of all masters is RHEL5.6, 4 cores, and 16GB RAM. We use default Winstone JVM container that comes with Jenkins. The SSH plugin installed on the masters: On old version 1.396 : Plugin Name: ssh-slaves Plugin Version: 0.14 On version 1.455: Plugin Name: ssh-slaves Plugin Version: 0.21 On version 1.509.3: Plugin Name: ssh-slaves Plugin Version: 0.25 The launch-slave-lnx runs: #!/bin/bash workdir=`dirname "$0"` cihost=`basename $workdir | cut -d_ -f3`.cisco.com JAVA_HOME=/auto/cwtools/java/jdk1.6.0_34/LNX-64 PATH=$JAVA_HOME/bin:/usr/cisco/bin:/opt/groovy/bin:$PATH export PATH wget -O $workdir/slave.jar https://$cihost:9081/jnlpJars/slave.jar --no-check-certificate java -Xms2048m -Xmx2048m -jar $workdir/slave.jar The workaround to solve the issue is to disconnect and reconnect slave. We still do face issue where build sometime do not progress even before checking out the code. Again, we disconnect and reconnect slave to solve that too. I conclude that: This is not a specific version of Jenkins related issue since it occurred with different version. It could be ssh-slave issue or simply Jenkins has memory leak where master is unable to use the existing ssh process running on it that have been connecting to the slave due to memory leak or network issues. Is doing SSH with "Launch slave via execution of command on the Master" not reliable? Should we simply use "Launch slave agents on Unix machines via SSH"? We do not do that because we keep the JDKs on the network filer location so we refer it as JAVA_HOME in shell script. I am not sure why the maven*-agent.jar already in the workspace that were extracted from last slave.jar downloaded from master become corrupted? Your comment or insight is greatly appreciated. Thank you. -Indra -- You received this message because you are subscribed to the Google Groups "Jenkins Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.
