Hi folks My team has been working on an issue on-and-off since July 23rd.
I think we might have hit the jackpot in terms of trying to reproduce the issue that affected us initially on July 23rd. Here’s what happened: - Once the copy of the Prod Jenkins Home finished, I started Jenkins into quiet mode <https://support.cloudbees.com/hc/en-us/articles/203737684-How-can-I-prevent-jenkins-from-starting-new-jobs-after-a-restart-?page=92> (I didn’t want a prod deployment that runs on a schedule running in stage by mistake). Jenkins started without issues. - Then, I disabled all the jobs (again to prevent a job from running by mistake whenever I took Jenkins out of quiet mode). - Then, since we were running stage with production’s config, the stage controller actually connected to the prod AWS account to create the agents there. Ooops. - Since having stage create its agents in the wrong AWS account is not ideal, I ran my ansible configuration playbook in stage. Three restarts later and Jenkins didn’t crash in any of them. Stage configuration was successful! - From the UI, I disabled quiet mode, but I noticed the builds were not starting. 2021-09-07 20:19:11.628+0000 [id=29] SEVERE hudson.triggers.SafeTimerTask#run: Timer task hudson.model.Queue$MaintainTask@7a94f7bb failed java.lang.IllegalStateException: The class jenkins.security.QueueItemAuthenticatorConfiguration was not found, potentially not yet loaded at hudson.ExtensionList.getInstance(ExtensionList.java:166) at jenkins.security.QueueItemAuthenticatorConfiguration.get(QueueItemAuthenticatorConfiguration.java:61) at jenkins.security.QueueItemAuthenticatorConfiguration$ProviderImpl.getAuthenticators(QueueItemAuthenticatorConfiguration.java:70) at jenkins.security.QueueItemAuthenticatorProvider$IteratorImpl.hasNext(QueueItemAuthenticatorProvider.java:44) at hudson.model.Queue$Item.authenticate(Queue.java:2331) at hudson.model.Node.canTake(Node.java:401) at hudson.model.Queue.makeFlyWeightTaskBuildable(Queue.java:1736) at hudson.model.Queue.makeBuildable(Queue.java:1698) at hudson.model.Queue.maintain(Queue.java:1546) at hudson.model.Queue$MaintainTask.doRun(Queue.java:2902) at hudson.triggers.SafeTimerTask.run(SafeTimerTask.java:91) at jenkins.security.ImpersonatingScheduledExecutorService$1.run(ImpersonatingScheduledExecutorService.java:58) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) - So I restarted Jenkins one more time (again, with the same configuration my playbook had left in the previous restart, no changes), when suddenly java.lang.IllegalStateException: Expected 1 instance of jenkins.security.s2m.AdminWhitelistRule but got 0 at hudson.ExtensionList.lookupSingleton(ExtensionList.java:451) at io.jenkins.plugins.casc.core.AdminWhitelistRuleConfigurator.instance(AdminWhitelistRuleConfigurator.java:59) at io.jenkins.plugins.casc.core.AdminWhitelistRuleConfigurator.instance(AdminWhitelistRuleConfigurator.java:42) at io.jenkins.plugins.casc.BaseConfigurator.check(BaseConfigurator.java:286) at io.jenkins.plugins.casc.BaseConfigurator.configure(BaseConfigurator.java:351) at io.jenkins.plugins.casc.BaseConfigurator.check(BaseConfigurator.java:287) at io.jenkins.plugins.casc.ConfigurationAsCode.lambda$checkWith$8(ConfigurationAsCode.java:777) at io.jenkins.plugins.casc.ConfigurationAsCode.invokeWith(ConfigurationAsCode.java:713) at io.jenkins.plugins.casc.ConfigurationAsCode.checkWith(ConfigurationAsCode.java:777) at io.jenkins.plugins.casc.ConfigurationAsCode.configureWith(ConfigurationAsCode.java:762) at io.jenkins.plugins.casc.ConfigurationAsCode.configureWith(ConfigurationAsCode.java:638) at io.jenkins.plugins.casc.ConfigurationAsCode.configure(ConfigurationAsCode.java:307) at io.jenkins.plugins.casc.ConfigurationAsCode.init(ConfigurationAsCode.java:299) This is an issue that has shown up before. Usually another restart fixes the issue, but I’ve now restarted Jenkins about 4 times and it still shows up that error. I’m hoping this will allow us to investigate a bit more what’s going on. I have the GC logs, logs, thread dumps and an SOS report from stage. The latest PID is 2058587, so the last GC logs is this file gc-2058587-2021-09-07_16-11-45.log. Some of those would need to be sanitized before I can share, but let me know if any of that would be useful. First and foremost, is there a fix for this? Secondly, is this a known bug? Best Regards, Doug Whitfield -- You received this message because you are subscribed to the Google Groups "Jenkins Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/a081f839-cd67-48ce-b4d4-a8ae73777019n%40googlegroups.com.
