-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Sidney Markowitz writes: > Malte S. Stretz wrote: > > Does anybody know what exactly goes wrong? Maybe it could work if we use > > port forwarding or stunnel or something to route the traffic to the dynamic > > clients over some server with a static IP? > > Here's my last svn failed log. It was on the native machine, and I just > discovered that the two slaves on the VMWare virtual machine have been > not responding for a couple of days, so it cannot be a matter of > simultaneous access on the same machine. I'm going to try to restart them. > > Could the svn server be sensitive to too many clients hitting the same > repository at the same time? Perhaps it would help to introduce a delay > between triggering one slave and the next, or if that is not possible > adding a sleep of a random time on the slaves before the svn update. I doubt that's it. First off, the svn failed logs were the same on all slaves as of the last svn checkin -- see http://bugzilla.spamassassin.org:8010/trunk-red-hat-7.3/builds/89/svn/0 http://bugzilla.spamassassin.org:8010/reqd-modules-only-5.8.1/builds/76/svn/0 both are running on the buildbot master machine as well. that's just because the SVN server was borked. Secondly, I have 4 slaves (a) started simultaneously and (b) hitting the repo simultaneously, on the buildbot machine. And if you look at 15:31:38 on Thu Dec 16, you can see 7 slaves hitting svn simultaneously, and all passing. So that's not it. Basically we have: - buildbot master host, localhost, no NAT: 5 slaves, always pass - jm: 1 slave, static IP, no NAT: debian-stable, always passes - parker: 3 slaves, behind NAT: frequent failures - sidney: 3 slaves, NAT?: frequent failures I think it's the NAT that causes the issue, and therefore the keepalive idea is the best bet... BTW bear in mind that the slaves are never connected *to*. Instead, they operate by opening a TCP connection to the master at startup, and receiving commands "pushed" to them via that. if that TCP conn dies, they disappear, and retry connections very slowly, like once every 10 mins with exponential backoff. - --j. > -- sidney > > The log: > > starting svn operation > command '['svn', 'update', '--revision', '122631']' in dir > /b/home/buildbot/slaves/sidney-fedora3/trunk-sidney-fedora3/build > (timeout 1200 secs) > svn: PROPFIND request failed on '/repos/asf/spamassassin/trunk' > svn: PROPFIND of '/repos/asf/spamassassin/trunk': Could not read status > line: connection was closed by server. (http://svn.apache.org) > update failed, clobbering and trying again > command '['rm', '-rf', > '/b/home/buildbot/slaves/sidney-fedora3/trunk-sidney-fedora3/build']' in > dir /b/home/buildbot/slaves/sidney-fedora3/trunk-sidney-fedora3 (timeout > 1200 secs) > now retrying VC operation > command '['svn', 'checkout', '--revision', '122631', > 'http://svn.apache.org/repos/asf/spamassassin/trunk', 'build']' in dir > /b/home/buildbot/slaves/sidney-fedora3/trunk-sidney-fedora3 (timeout > 1200 secs) > svn: PROPFIND request failed on '/repos/asf/spamassassin/trunk' > svn: PROPFIND of '/repos/asf/spamassassin/trunk': Could not read status > line: connection was closed by server. (http://svn.apache.org) > program finished with exit code 1 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Exmh CVS iD8DBQFBw1XoMJF5cimLx9ARAmbsAJ0QFRYByCiQ4WY6K47wN/E7wxru0ACeOHNj JTOK7lD2BWBdKwyF7DPs0sM= =xqJz -----END PGP SIGNATURE-----
