It has to be discussed with infra.
I am not sure which distro is used.
It's a Private Build of openJDK 8 (today CloudBees CI doesn't support
something else than Java 8 - no comment)
I don't have the feeling for now (with the data I reviewed) that it's a
memory / GC issue (it could create disconnections under high load)
Here the stability issue occurs even when the controller does nothing (the
controller cannot ping the agent or vice and versa) and it seems to impact
more the linux agents than the windows ones (it's a pity)



On Fri, Jul 23, 2021 at 12:06 AM Tibor Digana <tibordig...@apache.org>
wrote:

> Can you install AdoptOpenJdk for the Jenkins controller?
> It contains Eclipse OpenJ9 Garbage Collector and it significantly decreases
> memory consumption of the application due to the meta space goes to the
> disk.
> You should save 40 - 75% out of 3GB.
> I used G1, Shenandoah, ZGC and Eclipse OpenJ9 which saved the most memory.
>
> On Thu, Jul 22, 2021 at 9:23 AM Arnaud Héritier <aherit...@gmail.com>
> wrote:
>
> > yes for the controller it depends of its size (number of jobs and types
> of
> > jobs) but here we are fine it seems with our 3Gb
> >
> > * Java
> > - Version: 1.8.0&#95;292
> > - Maximum memory: 3.00 GB (3221225472)
> > - Allocated memory: 3.00 GB (3221225472)
> > - Free memory: 750.15 MB (786591664)
> > - In-use memory: 2.27 GB (2434633808)
> > - GC strategy: G1
> > - Available CPUs: 2
> >
> > For agents I reduced the memory allocated to the agent process but it
> > doesn't help much (it seems - even if it is still a good thing to do)
> >
> > What is strange is that I see our agents sometimes disconnected even when
> > we have no activity on the jenkins controller
> >
> > Sadly jenkins is deployed on Apache Tomcat thus I cannot get access to
> its
> > logs
> >
> > In general the connection lost is detected by what we call the
> PingThread (
> >
> >
> https://www.jenkins.io/doc/book/system-administration/monitoring/#ping-thread
> > ) but not only
> >
> > https://ci-maven.apache.org/log/all
> >
> > For example it was few minutes ago we got 3 agents disconnected while
> > nothing was running
> >
> > 2021-07-22 06:58:21.769+0000 [id=106291] INFO
> > hudson.slaves.ChannelPinger$1#onDead:
> > Ping failed. Terminating the channel maven4.
> > java.util.concurrent.TimeoutException: Ping started at 1626936861769
> hasn't
> > completed by 1626937101769
> > at hudson.remoting.PingThread.ping(PingThread.java:134)
> > at hudson.remoting.PingThread.run(PingThread.java:90)
> > 2021-07-22 06:58:21.778+0000 [id=106292] INFO
> > hudson.slaves.ChannelPinger$1#onDead:
> > Ping failed. Terminating the channel maven3.
> > java.util.concurrent.TimeoutException: Ping started at 1626936861777
> hasn't
> > completed by 1626937101778
> > at hudson.remoting.PingThread.ping(PingThread.java:134)
> > at hudson.remoting.PingThread.run(PingThread.java:90)
> > 2021-07-22 06:58:21.983+0000 [id=106295] INFO
> > hudson.slaves.ChannelPinger$1#onDead:
> > Ping failed. Terminating the channel maven5.
> > java.util.concurrent.TimeoutException: Ping started at 1626936861982
> hasn't
> > completed by 1626937101983
> > at hudson.remoting.PingThread.ping(PingThread.java:134)
> > at hudson.remoting.PingThread.run(PingThread.java:90)
> >
> > @Gavin McDonald <gmcdon...@apache.org> In terms of network, is it the
> same
> > environment we use today compared to the ci-builds.apache.org
> environment
> > ?
> >
> >
> > On Wed, Jul 21, 2021 at 11:48 PM Tibor Digana <tibordig...@apache.org>
> > wrote:
> >
> > > In my company, I also used 1GB for Xmx of Java Heap for the Jenkins JVM
> > and
> > > it was enough.
> > > The subprocesses like Maven need to have much more memory to allocate
> for
> > > themself rather than Jenkins JVM.
> > > T
> > >
> > > On Wed, Jul 21, 2021 at 6:38 PM Arnaud Héritier <aherit...@gmail.com>
> > > wrote:
> > >
> > > > I am looking at our builds and I try to understand why our agents are
> > > often
> > > > disconnected during the builds.
> > > > We have in general a stacktrace like
> > > >
> > > > maven6 was marked offline: Connection was broken:
> java.io.IOException:
> > > > Pipe closed after 0 cycles
> > > >         at
> > > >
> > >
> >
> org.apache.sshd.common.channel.ChannelPipedInputStream.read(ChannelPipedInputStream.java:118)
> > > >         at
> > > >
> > >
> >
> org.apache.sshd.common.channel.ChannelPipedInputStream.read(ChannelPipedInputStream.java:101)
> > > >         at
> > > >
> > >
> >
> hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:92)
> > > >         at
> > > >
> > hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java:73)
> > > >         at
> > > >
> > >
> >
> hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:103)
> > > >         at
> > > >
> > >
> >
> hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:39)
> > > >         at
> > > >
> > >
> >
> hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
> > > >         at
> > > >
> > >
> >
> hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)
> > > >
> > > >
> > > >
> > > > As far I can see we are using 16Gb "hosts" for linux agents
> > > >
> > > > Something very strange is that the jenkins agent (this small
> component
> > > > doing the link between the build host and the controller) is
> configured
> > > > with `-Xms8g -Xmx8g` thus we are reserving for it 50% of the server
> mem
> > > > (even more because of the non-heap)
> > > > This one in general should require in general really less. 1Gb is
> > > already a
> > > > lot from my exp.
> > > > Due to this, the OS can see it has the biggest process on the host
> and
> > > > decide to kill it when the rest of the memory is used by the build.
> > > > I think we should decrease this value.
> > > > (I can do it but I don't know how was configured the ci.apache.org
> > > agents
> > > > and I would like to not add more issue if this setting was here in
> the
> > > past
> > > >
> > > > I don't think it is the root cause of our instabilities (at least
> all)
> > > and
> > > > there is something else I have to find but it's a cheap fix to try
> > > >
> > > > FYI our agents VMs are ~like this today:
> > > >
> > > > - Java
> > > > + Home: `/usr/local/asfpackages/java/oraclejdk-1.8.0-291/jre`
> > > > + Vendor: Oracle Corporation
> > > > + Version: 1.8.0&#95;291
> > > > + Maximum memory: 7.67 GB (8232370176)
> > > > + Allocated memory: 7.67 GB (8232370176)
> > > > + Free memory: 6.03 GB (6470953760)
> > > > + In-use memory: 1.64 GB (1761416416)
> > > > + GC strategy: ParallelGC
> > > > + Available CPUs: 4
> > > >
> > > > 8Gb is reserved, 1Gb is used (because the GC does nothing as the Free
> > mem
> > > > is high)
> > > >
> > > > I would be in favor to try to launch them with -Xms128m
> > > > -Xmx1g -XX:+UseG1GC -XX:+UseStringDeduplication
> > > >
> > > > I think it's enough customization to start with
> > > >
> > > > Cheers
> > > >
> > > > On Wed, Jul 21, 2021 at 1:28 PM Arnaud Héritier <aherit...@gmail.com
> >
> > > > wrote:
> > > >
> > > > > I am not sure about the setup
> > > > > AFAICS we don't use any JDK installer (
> > > > > https://ci-maven.apache.org/configureTools/ ) thus I suppose that
> > the
> > > > > different JDKs are supposed to be installed directly on the agent ?
> > > > > I am not sure how it was done on the previous environment
> > > > >
> > > > > On Sun, Jul 18, 2021 at 5:30 PM Tibor Digana <
> tibordig...@apache.org
> > >
> > > > > wrote:
> > > > >
> > > > >> The new CI  system has the following issue:
> > > > >>
> > > > >> /home/jenkins/tools/java/latest1.7/bin/java: not found
> > > > >>
> > > > >>
> > > > >>
> > > >
> > >
> >
> https://ci-maven.apache.org/job/Maven/job/maven-box/job/maven-surefire/job/master/104/execution/node/183/log/
> > > > >>
> > > > >>
> > > > >>
> > > > >> On Wed, Jun 30, 2021 at 8:03 PM Gavin McDonald <
> > gmcdon...@apache.org>
> > > > >> wrote:
> > > > >>
> > > > >> > Hi Maven folks.
> > > > >> >
> > > > >> > Infra has decided to separate off the Maven build jobs from
> > > > >> > ci-builds.apache.org over to its very own Jenkins Controller
> and
> > > > >> Agents.
> > > > >> >
> > > > >> > This means that Maven now has a dedicated Jenkins environment
> for
> > > > >> itself.
> > > > >> > It
> > > > >> > also means that no other projects build jobs can build on the
> > Maven
> > > > >> nodes;
> > > > >> > and
> > > > >> > then Maven jobs will no longer  be able to build on the
> ci-builds
> > > > jobs.
> > > > >> >
> > > > >> > Your new Controller is set up as https://ci-maven.apache.org
> and
> > > all
> > > > >> Maven
> > > > >> > Committers
> > > > >> > can login via LDAP and create jobs.
> > > > >> >
> > > > >> > At the time of writing, there is one node/agent attached but I
> am
> > > > >> building
> > > > >> > 4 more  - all
> > > > >> > Ubuntu 20.04 and based in our Azure account.
> > > > >> >
> > > > >> > We can automagically move all your jobs over from ci-builds to
> > > > ci-maven
> > > > >> - I
> > > > >> > just need someone to tell me go ahead and do it.
> > > > >> >
> > > > >> > In the meantime, feel free to have a test. The remaining 4
> agents
> > > will
> > > > >> be
> > > > >> > online
> > > > >> > by tomorrow. We will review after a month if 5 is enough nodes.
> > > > >> >
> > > > >> > As with other projects having their own dedicated controller,
> who
> > > have
> > > > >> > taken advantage
> > > > >> > of this isolation by having some nodes/agents given to the
> project
> > > as
> > > > a
> > > > >> > 'targeted donation'
> > > > >> > so someone here may know of a Company will to donate 5 - 10 or
> > more
> > > > >> nodes
> > > > >> > specifically
> > > > >> > for Maven Jenkins environment. Infra can afford to hand you
> over 5
> > > > right
> > > > >> > now.
> > > > >> >
> > > > >> > Let me know if you have any questions, otherwise let me know
> when
> > I
> > > > can
> > > > >> > make the
> > > > >> > transfer of your jobs.
> > > > >> >
> > > > >> > Thanks
> > > > >> >
> > > > >> > --
> > > > >> >
> > > > >> > *Gavin McDonald*
> > > > >> > Systems Administrator
> > > > >> > ASF Infrastructure Team
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > > > --
> > > > > Arnaud Héritier
> > > > > Twitter/Skype : aheritier
> > > > >
> > > >
> > > >
> > > > --
> > > > Arnaud Héritier
> > > > Twitter/Skype : aheritier
> > > >
> > >
> >
> >
> > --
> > Arnaud Héritier
> > Twitter/Skype : aheritier
> >
>


-- 
Arnaud Héritier
Twitter/Skype : aheritier

Reply via email to