On 04/03/2015 12:47 PM, Hausmann Simon wrote: > Hi, > > I believe what we are seeing is caused by instability in the network that > connects the Jenkins service with the Jenkins slave machines. Occasionally > network connectivity between the slaves and the master is lost, causing the > running build as a whole to abort - all other still running builds are > aborted and the results from builds that had already finished are discarded. > In an attempt to recover, a whole new integration with builds for all > configurations is started. > > We have observed that this scenario repeats itself several times, causing > overall integration of many hours.
I think this is documented here: http://code.qt.io/cgit/qt/qtqa.git/tree/scripts/jenkins/qt-jenkins-integrator.pl#n439 once $MAX_ATTEMPTS is reached .. somobody needs to manually restart the integrator .. CI admins should be notified with an email like http://lists.qt-project.org/pipermail/ci-reports/2015-April/038140.html > As part of the work on the new CI system, we have observed similar network > connectivity related symptoms. We are treating them more gracefully by not > discarding otherwise successful results. Nevertheless it is a major annoyance. > > Based on rumors and observation of symptoms it is a theory of Frederik and I > that there is a firewall service centrally installed in this virtual network. > It shows symptoms of connection tracking and - more importantly - signs of > being able to handle only an insufficient amount of traffic or connections. > Beyond that limit, connection attempts time out and existing connections > become "spotty". > > I would like to get to the bottom of this at some point, because it severely > affects the efficiency of the current ci system as well. > > Tony, do you happen to have any more details about this? > > I'll see about filing a ticket with IT next week unless we conclude anything > different. > > Simon > > Original Message > From: Thiago Macieira > Sent: Friday, April 3, 2015 07:11 > To: [email protected] > Subject: [Development] Why are qtbase integrations taking so long? > > > qtbase integrations used to take around 3 hours as recently as two weeks ago. > > In the past week, I've caught several integrations lasting more than 6 hours. > The one currently running is integrating a single commit and has been running > for 6h30. I've seen one for 12 hours. > > Is this a timeout not caught by the coordinator? > > http://testresults.qt.io/ci/status/ says that it is in state "monitor-jenkins- > build" and "build_attempt: 6". For attempt 5, the only stage not to be at > SUCCESS was linux-g++_developer-build_qtnamespace_qtlibinfix_RHEL65_x64. The > same for attempts 3 and 4. I think that the integrator (coordinator) gives up after 8 retries/attempts .. so if qtbase takes around 3 hrs to run and it is run 8 times .. you could easily wait (worst case) for 24 hrs if no action is taken <http://code.qt.io/cgit/qt/qtqa.git/tree/scripts/jenkins/qt-jenkins-integrator.pl#n1056> -- Sergio Ahumada [email protected] _______________________________________________ Development mailing list [email protected] http://lists.qt-project.org/mailman/listinfo/development
