Re: buildbot failure in [...]

Justin Mason 17 Dec 2004 21:57:43 -0000

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Sidney Markowitz writes:
> Malte S. Stretz wrote:
> > Does anybody know what exactly goes wrong?  Maybe it could work if we use 
> > port forwarding or stunnel or something to route the traffic to the dynamic 
> > clients over some server with a static IP?
> 
> Here's my last svn failed log. It was on the native machine, and I just 
> discovered that the two slaves on the VMWare virtual machine have been 
> not responding for a couple of days, so it cannot be a matter of 
> simultaneous access on the same machine. I'm going to try to restart them.
> 
> Could the svn server be sensitive to too many clients hitting the same 
> repository at the same time? Perhaps it would help to introduce a delay 
> between triggering one slave and the next, or if that is not possible 
> adding a sleep of a random time on the slaves before the svn update.

I doubt that's it.  First off, the svn failed logs were the same on
all slaves as of the last svn checkin -- see

http://bugzilla.spamassassin.org:8010/trunk-red-hat-7.3/builds/89/svn/0
http://bugzilla.spamassassin.org:8010/reqd-modules-only-5.8.1/builds/76/svn/0

both are running on the buildbot master machine as well.  that's just
because the SVN server was borked.

Secondly, I have 4 slaves (a) started simultaneously and (b) hitting
the repo simultaneously, on the buildbot machine.  And if you look at
15:31:38 on Thu Dec 16, you can see 7 slaves hitting svn simultaneously,
and all passing.  So that's not it.

Basically we have:

    - buildbot master host, localhost, no NAT: 5 slaves, always pass
    - jm: 1 slave, static IP, no NAT: debian-stable, always passes
    - parker: 3 slaves, behind NAT: frequent failures
    - sidney: 3 slaves, NAT?: frequent failures

I think it's the NAT that causes the issue, and therefore the keepalive
idea is the best bet...

BTW bear in mind that the slaves are never connected *to*.  Instead, they
operate by opening a TCP connection to the master at startup, and
receiving commands "pushed" to them via that.  if that TCP conn dies, they
disappear, and retry connections very slowly, like once every 10 mins with
exponential backoff.

- --j.

>   -- sidney
> 
> The log:
> 
> starting svn operation
> command '['svn', 'update', '--revision', '122631']' in dir 
> /b/home/buildbot/slaves/sidney-fedora3/trunk-sidney-fedora3/build 
> (timeout 1200 secs)
> svn: PROPFIND request failed on '/repos/asf/spamassassin/trunk'
> svn: PROPFIND of '/repos/asf/spamassassin/trunk': Could not read status 
> line: connection was closed by server. (http://svn.apache.org)
> update failed, clobbering and trying again
> command '['rm', '-rf', 
> '/b/home/buildbot/slaves/sidney-fedora3/trunk-sidney-fedora3/build']' in 
> dir /b/home/buildbot/slaves/sidney-fedora3/trunk-sidney-fedora3 (timeout 
> 1200 secs)
> now retrying VC operation
> command '['svn', 'checkout', '--revision', '122631', 
> 'http://svn.apache.org/repos/asf/spamassassin/trunk', 'build']' in dir 
> /b/home/buildbot/slaves/sidney-fedora3/trunk-sidney-fedora3 (timeout 
> 1200 secs)
> svn: PROPFIND request failed on '/repos/asf/spamassassin/trunk'
> svn: PROPFIND of '/repos/asf/spamassassin/trunk': Could not read status 
> line: connection was closed by server. (http://svn.apache.org)
> program finished with exit code 1
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFBw1XoMJF5cimLx9ARAmbsAJ0QFRYByCiQ4WY6K47wN/E7wxru0ACeOHNj
JTOK7lD2BWBdKwyF7DPs0sM=
=xqJz
-----END PGP SIGNATURE-----

Re: buildbot failure in [...]

Reply via email to