Thanx Gautham for chasing this.

I think there are still some 119 in the build queue, if you see on the left
here [1](Search for Build Queue). They are all stuck on "Waiting for next
available executor on Windows"

If you aborted all previously & they showed up now again, then something is
still messed up with the configurations that the pipeline is getting
triggered for the existing PR (not new), if you didn't abort earlier then
maybe you need to abort all the ones in queue and free up the resources.

One example of build waiting (as of now) for resource since past 7 hours [2]

Let me know if you are stuck, we can together get things figured out :-)

-Ayush


[1]
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/view/change-requests/builds
[2]
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/view/change-requests/job/PR-6423/2/console

On Sun, 28 Apr 2024 at 13:43, Gautham Banasandra <gaur...@apache.org> wrote:

> Hi folks,
>
> I apologize for the inconvenience caused. I've now applied the mitigation
> described in [3].
>
> Unfortunately, there are only 12 Windows nodes in the whole swarm of
> Jenkins build nodes.
> Thus, this caused a starvation of the Windows nodes for other projects.
>
> I had reached out to the infra team several months ago and requested them
> to add more
> Windows nodes, but it was turned down. I'm not sure if there's a way
> around this, other than
> getting more Windows nodes.
>
> Thanks,
> --Gautham
>
> On 2024/04/28 04:53:32 Ayush Saxena wrote:
> > Found this on dev@hadoop -> Moving to common-dev (the ML we use)
> >
> > I think there was some initiative to enable Windows Pre-Commit for every
> PR
> > and that seems to have gone wild, either the number of PRs raised are way
> > more than the capacity the nodes can handle or something got
> misconfigured
> > in the job itself that the build is getting triggered for all the open PR
> > not just new, which is leading to starvation of resources.
> >
> > To the best of my knowledge
> > @Gautham Banasandra <gaur...@apache.org> / @Iñigo Goiri <
> elgo...@gmail.com> are
> > chasing the initiative, can you folks help check?
> >
> > There are concerns raised by the Infra team here [1] on dev@hadoop
> >
> > Most probably something messed up while configuring the
> > hadoop-multibranch-windows job, it shows some 613 PR scheduled [2], I
> think
> > it scheduled for all open ones, something similar happened long-long ago
> > when we were doing migrations, can fetch pointers from [3]
> >
> > [1] https://lists.apache.org/thread/7nsyd0vtpb87fhm0fpv8frh6dzk3b3tl
> > [2]
> >
> https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/view/change-requests/builds
> > [3] https://lists.apache.org/thread/8pxf2yon3r9g61zgv9cf120qnhrs8q23
> >
> > -Ayush
> >
> >
> > On 2024/04/26 16:59:04 Wei-Chiu Chuang wrote:
> > > I'm not familiar with Windows build. But you may have better luck
> reaching
> > > out to Apache Infra
> > > https://infra.apache.org/contact.html
> > >
> > > mailing list, jira or even slack
> > >
> > > On Fri, Apr 26, 2024 at 9:42 AM Cesar Hernandez <cesargu...@gmail.com>
> > > wrote:
> > >
> > > > Hello,
> > > > An option that can be implemented in the Hadoop pipeline [1] is to
> set a
> > > > timeout [2] on critical stages within the pipelines, for example in
> > > > "Windows 10" stage .
> > > > As for the issue the Ci build is logging [3] in the
> hadoop-multibranch
> > jobs
> > > > reported by Chris, it seems the issue is around the Post (cleanup)
> > pipeline
> > > > process. My two cents is to use cleanWs() instead of deleteDir() as
> > > > documented in: https://plugins.jenkins.io/ws-cleanup/
> > > >
> > > > [1]
> > > >
> > > >
> >
> https://github.com/apache/hadoop/blob/trunk/dev-support/jenkinsfile-windows-10
> > > >
> > > > [2]
> > > >
> > > >
> >
> https://www.jenkins.io/doc/pipeline/steps/workflow-basic-steps/#timeout-enforce-time-limit
> > > >
> > > > [3]
> > > >
> > > > Still waiting to schedule task
> > > > Waiting for next available executor on ‘Windows
> > > > <https://ci-hadoop.apache.org/label/Windows/>’[Pipeline] //
> > > > node[Pipeline] stage
> > > > <
> > > >
> >
> https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-1137/1/console#
> > > > >[Pipeline]
> > > > { (Declarative: Post Actions)
> > > > <
> > > >
> >
> https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-1137/1/console#
> > > > >[Pipeline]
> > > > script <
> > > >
> >
> https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-1137/1/console#
> > > > >[Pipeline]
> > > > { <
> > > >
> >
> https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-1137/1/console#
> > > > >[Pipeline]
> > > > deleteDir <
> > > >
> >
> https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-1137/1/console#
> > > > >[Pipeline]
> > > > }[Pipeline] // scriptError when executing cleanup post condition:
> > > > Also:   org.jenkinsci.plugins.workflow.actions.ErrorAction$ErrorId:
> > > > ca1b7f2f-ec16-4bde-ac51-85f964794e37
> > > > org.jenkinsci.plugins.workflow.steps.MissingContextVariableException:
> > > > Required context class hudson.FilePath is missing
> > > > Perhaps you forgot to surround the code with a step that provides
> > > > this, such as: node
> > > >         at
> > > >
> >
> org.jenkinsci.plugins.workflow.steps.StepDescriptor.checkContextAvailability(StepDescriptor.java:265)
> > > >         at
> > org.jenkinsci.plugins.workflow.cps.DSL.invokeStep(DSL.java:300)
> > > >         at
> > > > org.jenkinsci.plugins.workflow.cps.DSL.invokeMethod(DSL.java:196)
> > > >         at
> > > >
> >
> org.jenkinsci.plugins.workflow.cps.CpsScript.invokeMethod(CpsScript.java:124)
> > > >         at
> > jdk.internal.reflect.GeneratedMethodAccessor1084.invoke(Unknown
> > > > Source)
> > > >         at
> > > >
> >
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > > >         at java.base/java.lang.reflect.Method.invoke(Method.java:566)
> > > >         at
> > > >
> org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:98)
> > > >         at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
> > > >         at
> > groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1225)
> > > >         at
> > groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1034)
> > > >         at
> > > >
> >
> org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:41)
> > > >         at
> > > >
> >
> org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
> > > >         at
> > > >
> >
> org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116)
> > > >         at
> > org.kohsuke.groovy.sandbox.impl.Checker$1.call(Checker.java:180)
> > > >         at
> > > >
> >
> org.kohsuke.groovy.sandbox.GroovyInterceptor.onMethodCall(GroovyInterceptor.java:23)
> > > >         at
> > > >
> >
> org.jenkinsci.plugins.scriptsecurity.sandbox.groovy.SandboxInterceptor.onMethodCall(SandboxInterceptor.java:163)
> > > >         at
> > org.kohsuke.groovy.sandbox.impl.Checker$1.call(Checker.java:178)
> > > >         at
> > > > org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:182)
> > > >         at
> > > > org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:152)
> > > >         at
> > > > org.kohsuke.groovy.sandbox.impl.Checker.checkedCall(Checker.java:152)
> > > >         at
> > > >
> >
> com.cloudbees.groovy.cps.sandbox.SandboxInvoker.methodCall(SandboxInvoker.java:17)
> > > >         at
> > > >
> >
> org.jenkinsci.plugins.workflow.cps.LoggingInvoker.methodCall(LoggingInvoker.java:105)
> > > >         at WorkflowScript.run(WorkflowScript:196)
> > > >         at ___cps.transform___(Native Method)
> > > >         at
> > > >
> >
> com.cloudbees.groovy.cps.impl.ContinuationGroup.methodCall(ContinuationGroup.java:90)
> > > >         at
> > > >
> >
> com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.dispatchOrArg(FunctionCallBlock.java:116)
> > > >         at
> > > >
> >
> com.cloudbees.groovy.cps.impl.FunctionCallBlock$ContinuationImpl.fixName(FunctionCallBlock.java:80)
> > > >         at
> > jdk.internal.reflect.GeneratedMethodAccessor1046.invoke(Unknown
> > > > Source)
> > > >         at
> > > >
> >
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > > >         at java.base/java.lang.reflect.Method.invoke(Method.java:566)
> > > >         at
> > > >
> >
> com.cloudbees.groovy.cps.impl.ContinuationPtr$ContinuationImpl.receive(ContinuationPtr.java:72)
> > > >         at
> > > >
> com.cloudbees.groovy.cps.impl.ConstantBlock.eval(ConstantBlock.java:21)
> > > >         at com.cloudbees.groovy.cps.Next.step(Next.java:83)
> > > >         at
> > > > com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:152)
> > > >         at
> > > > com.cloudbees.groovy.cps.Continuable$1.call(Continuable.java:146)
> > > >         at
> > > >
> >
> org.codehaus.groovy.runtime.GroovyCategorySupport$ThreadCategoryInfo.use(GroovyCategorySupport.java:136)
> > > >         at
> > > >
> >
> org.codehaus.groovy.runtime.GroovyCategorySupport.use(GroovyCategorySupport.java:275)
> > > >         at
> > com.cloudbees.groovy.cps.Continuable.run0(Continuable.java:146)
> > > >         at
> > > >
> >
> org.jenkinsci.plugins.workflow.cps.SandboxContinuable.access$001(SandboxContinuable.java:18)
> > > >         at
> > > >
> >
> org.jenkinsci.plugins.workflow.cps.SandboxContinuable.run0(SandboxContinuable.java:51)
> > > >         at
> > > >
> >
> org.jenkinsci.plugins.workflow.cps.CpsThread.runNextChunk(CpsThread.java:187)
> > > >         at
> > > >
> >
> org.jenkinsci.plugins.workflow.cps.CpsThreadGroup.run(CpsThreadGroup.java:423)
> > > >         at
> > > >
> >
> org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:331)
> > > >         at
> > > >
> >
> org.jenkinsci.plugins.workflow.cps.CpsThreadGroup$2.call(CpsThreadGroup.java:295)
> > > >         at
> > > >
> >
> org.jenkinsci.plugins.workflow.cps.CpsVmExecutorService$2.call(CpsVmExecutorService.java:97)
> > > >         at
> > > > java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> > > >         at
> > > >
> >
> hudson.remoting.SingleLaneExecutorService$1.run(SingleLaneExecutorService.java:139)
> > > >         at
> > > >
> >
> jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28)
> > > >         at
> > > >
> >
> jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:68)
> > > >         at
> > > >
> >
> jenkins.util.ErrorLoggingExecutorService.lambda$wrap$0(ErrorLoggingExecutorService.java:51)
> > > >         at
> > > >
> >
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> > > >         at
> > > > java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> > > >         at
> > > >
> >
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> > > >         at
> > > >
> >
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> > > >         at java.base/java.lang.Thread.run(Thread.java:829)
> > > > [Pipeline] }[Pipeline] // stage[Pipeline] End of PipelineQueue task
> > > > was cancelled
> > > > org.jenkinsci.plugins.workflow.actions.ErrorAction$ErrorId:
> > > > dc84ec50-8661-44a1-a7c0-ba575feca31d
> > > >
> > > >
> > > > El vie, 26 abr 2024 a las 7:56, Chris Thistlethwaite (<
> chr...@apache.org
> > >)
> > > > escribió:
> > > >
> > > > > Greetings all!
> > > > >
> > > > > It was brought to my attention this morning that all the shared
> > Jenkins
> > > > > Windows nodes were leased out to ci-hadoop. Upon investigation, it
> > > > > looks like there are several builds stuck for the last 3+ days. The
> > > > > particular build in question is
> > > > > https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/
> > > > >
> > > > > There are a ton of Windows builds in the queue as well, so even if
> I
> > > > > start killing these off, they are going to be taking over the nodes
> > > > > again and likely failing/sticking at the same place.
> > > > >
> > > > > Can someone take a look at the build config? I'll have to force
> stop
> > > > > these builds.
> > > > >
> > > > > Please add me to any replies as I'm not subbed to this list.
> > > > >
> > > > > Thanks!
> > > > > -Chris T.
> > > > > #asfinfra
> > > > >
> > > >
> > > >
> > > > --
> > > > Atentamente:
> > > > César Hernández.
> > > >
> > >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
>
>

Reply via email to