IIUC the call to Jenkins.removeNode() should only be held while you have the Queue's lock as otherwise you can end up with a job scheduled to start on the node that is removed.

Perhaps we should escalate the Jenkins.removeNode() code so that it ensures that the Queue's lock has been obtained prior to starting, but it's not always clear.

I may take another stab an analysing the leaky hack that is the Cloud API in Jenkins and see if there is anything else we can improve.

(Basically the route issue we found that lead to some refactoring was builds being scheduled and then failing on a node that does not exist... all paths identified that the nodes were being removed while the queue maintenance was in progress, thus you should not remove nodes concurrent with the queue being maintained, instead you need to get the Queue's lock, confirm that the node is truly idle and then remove it all while holding the Queue's lock... and even that has issues... as the job can be scheduled but not started... so actually you mark the node as no longer accepting tasks, release the lock, sleep 100ms, re-acquire the lock and then double check it's idle and only then do you remove the node... madness)

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators.
For more information on JIRA, see: http://www.atlassian.com/software/jira

--
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to