lupyuen commented on issue #17914:
URL: https://github.com/apache/nuttx/issues/17914#issuecomment-3850470159

   Good News: Over the past week we used __24 Full-Time GitHub Runners__. Which 
is below the ASF limit of __25 Full-Time GitHub Runners__. [(Explained 
here)](https://lupyuen.org/articles/ci3)
   
   https://infra-reports.apache.org/#ghactions&hours=168&limit=15&group=name
   
   <img width="1521" height="1202" alt="Image" 
src="https://github.com/user-attachments/assets/25164183-f299-4ec8-abef-f2eafd69ca2c";
 />
   
   So we'll close this issue for now. As always: Everyone can monitor the Live 
Usage here: https://lupyuen.github.io/nuttx-metrics/github-fulltime-runners.png
   
   <img 
src="https://lupyuen.github.io/nuttx-metrics/github-fulltime-runners.png"; />
   
   _Has this overuse happened before?_
   
   We see brief spikes in usage of GitHub Runners during NuttX Releases. But 
Jan-Feb 2026 was the busiest sustained peak...
   
   
https://docs.google.com/spreadsheets/d/13QOAzC84eUYcB7xmPT0lo5L-VXnvY5cRvA9rmqC8utM/edit?gid=0#gid=0
   
   <img width="1693" height="1037" alt="Image" 
src="https://github.com/user-attachments/assets/6a22d6af-b323-47b3-88ab-e5fe76200332";
 />
   
   [(Generated by 
history.sh)](https://github.com/lupyuen/nuttx-metrics/blob/main/history.sh)
   
   _Why has the GitHub Load jumped significantly since 9 Dec 2025?_
   
   Maybe something we changed in NuttX CI? Needs more monitoring and analysis 
(during the quieter Lunar New Year holidays)
   
   Lesson Learnt: __Continuous Operational Monitoring__ of NuttX CI is super 
important! Even after revamping NuttX CI into a Distributed Build + Test 
system. 
   
   _What happened when we're using too many GitHub Runners?_
   
   Few days ago: Some of our CI Jobs were stuck forever with this message:
   
   https://github.com/apache/nuttx/actions/runs/21600990965/job/62279838438
   
   > _Job is waiting for a hosted runner to come online. <br> Job is about to 
start running on the hosted runner_
   
   We were hurting Other Apache Projects too, because GitHub Runners are pooled 
across All Apache Projects.
   
   _Suppose we have an idea for reducing the CI Load. How many GitHub Runners 
will it actually save?_
   
   Check the __"Total Run Time"__ in the GitHub Actions Log. A Typical CI Build 
will require __28 Hours__ of GitHub Runners...
   
   https://github.com/apache/nuttx/actions/runs/21365186482/usage
   
   <img width="1762" height="1046" alt="Screenshot 2026-01-27 at 8 13 46 AM" 
src="https://github.com/user-attachments/assets/a8279cd9-5b61-4aa4-9a24-8d6839165447";
 />
   
   Suppose we propose to optimise the Doc Build. A Doc Build requires __1.5 
minutes__ of GitHub Runners...
   
   https://github.com/apache/nuttx/actions/runs/21378476592/usage
   
   <img width="1523" height="948" alt="Screenshot 2026-01-27 at 8 09 59 AM" 
src="https://github.com/user-attachments/assets/bbce351d-e9aa-4295-ba10-4892f9518377";
 />
   
   So Doc Builds take up less than 0.1% of the GitHub Runners of a Typical CI 
Build. Which means that the optimal Doc Build probably won't reduce by much the 
GitHub Runners.
   
   Also remember: Fixing NuttX CI is highly risky. It might break the frequent 
CI Builds and/or NuttX Release Process. Someone needs to standby 24 x 7 to 
watch over the CI Fix, in case it goes haywire and needs to be rolled back 
ASAP. 
   
   _How should we revamp NuttX CI?_
   
   I have no idea, though we have [plenty of data to guide 
us](https://lupyuen.org/articles/ci3). Please confirm later whether btashton / 
@simbit18 / @lupyuen are keen to take on the job. NuttX CI is super stressful 
and exhausting, some of us might not wish to continue the job e.g. due to 
health reasons. (I have hypertension)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to