For now, I created a PR to reduce the frequency by half: https://github.com/apache/spark/pull/55729
On Thu, 7 May 2026 at 07:56, Yicong Huang <[email protected]> wrote: > I think we need to 1) cut CIs pressure and 2) look for more resources to > run CIs at the same time. > > Cut CIs: > > - I think the biggest cut would be on the scheduled jobs first. For > instance change 3.5 and 4.0 scheduled jobs from daily to once in three > days, or even once per week. > - Then for branch 4.x or more active release branches we can do daily > post merge CI, instead of after each commit? > - Meanwhile we can explore ways to run selected tests on the actual > affected code path to avoid full runs. > - And optimize tests themselves so they run faster. > > Expand resources: > > - We can probably move some of the scheduled jobs out to another repo > like what Apache Arrow did. > - I wonder if self hosted runners are acceptable to the community? > This sounds like a longer term solution if we were to introduce more checks > in the future. > > > Best regards, > Yicong Huang > > On Wed, May 6, 2026 at 3:04 PM Hyukjin Kwon <[email protected]> wrote: > >> We should probably reduce the scheduled build for the time being. >> >> As a reference, I worked in Apache Arrow, and they use an extra CI by >> thirdparty, e.g., see >> - PR: https://github.com/apache/arrow/pull/48915 >> <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Farrow%2Fpull%2F48915&data=05%7C02%7Cyiconghuang%40umass.edu%7C925f538971c045093e2a08deabbb6bf5%7C7bd08b0b33954dc194bbd0b2e56a497f%7C0%7C0%7C639137018630755244%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=yIAcTyytFETWD5dWaKPKr4B2Pw1%2BNFyyChskxhSFcZE%3D&reserved=0> >> - You comment like >> https://github.com/apache/arrow/pull/48915#issuecomment-3852062184 >> <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Farrow%2Fpull%2F48915%23issuecomment-3852062184&data=05%7C02%7Cyiconghuang%40umass.edu%7C925f538971c045093e2a08deabbb6bf5%7C7bd08b0b33954dc194bbd0b2e56a497f%7C0%7C0%7C639137018630807540%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=0NpSP%2FMHlidY10rwOPeDbYyCNMV8yWCKcKAc580t9xE%3D&reserved=0> >> - It posts the CI link like >> https://github.com/apache/arrow/pull/48915#issuecomment-3852079993 >> <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Farrow%2Fpull%2F48915%23issuecomment-3852079993&data=05%7C02%7Cyiconghuang%40umass.edu%7C925f538971c045093e2a08deabbb6bf5%7C7bd08b0b33954dc194bbd0b2e56a497f%7C0%7C0%7C639137018630856045%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=eotDM%2Fyb4uCVDgG3BRTmRZ5k6XDJ9hW54mwYe8ab56c%3D&reserved=0> >> - The CI is defined at https://github.com/ursacomputing/crossbow >> <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fursacomputing%2Fcrossbow&data=05%7C02%7Cyiconghuang%40umass.edu%7C925f538971c045093e2a08deabbb6bf5%7C7bd08b0b33954dc194bbd0b2e56a497f%7C0%7C0%7C639137018630902539%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=nrWQwyc5L2MMaDnGkLpAcwLNCMvfe8IVo%2FGQ9whAYJM%3D&reserved=0> >> >> I feel like this can be an alternative if any vendor is willing to >> support it. >> >> On Thu, 7 May 2026 at 04:09, Tian Gao via dev <[email protected]> >> wrote: >> >>> I did some quick calculations, and we can't afford the CI with our >>> existing infra. >>> >>> Per ASF policy (https://infra.apache.org/github-actions-policy.html >>> <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Finfra.apache.org%2Fgithub-actions-policy.html&data=05%7C02%7Cyiconghuang%40umass.edu%7C925f538971c045093e2a08deabbb6bf5%7C7bd08b0b33954dc194bbd0b2e56a497f%7C0%7C0%7C639137018630945683%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=2RDYe4opehAqZ6er1r1JD2Kk1mcQ5Rx46annPpjkkfo%3D&reserved=0>), >>> the maximum weekly runner minutes we have is 250k. That's 1m per month, and >>> last month, we hit almost the exact number - 1,082,721 minutes. >>> >>> Our current CI consists of a few components (all numbers are per month): >>> * each commits on master branch - ~280k >>> * 4.1 scheduled run - ~200k >>> * 4.0 scheduled run - ~200k >>> * 3.5 scheduled run - negligible because we don't run many tests >>> * master scheduled run ~ 300k >>> >>> With the new release cadence, even if we only do scheduled run on 4.x >>> (which we shouldn't because it's an active dev branch but that's another >>> story), we need an extra 200k. With a 6-month maintenance window, we will >>> always have at least 3 active maintained versions (including LTS) that >>> require CI. >>> >>> If it's just 200k extra, maybe it's manageable. But I really believe we >>> need tests for the 4.x branch - we should treat that branch more like >>> master, than say 4.2. Even if we don't do pre-merge check on it, we should >>> do post-merge check for every commit. Daily check on an active dev branch >>> sounds a bit too risky to me. That would be another 300k. >>> >>> This does not include the discussion about any pre-merge check for 4.x, >>> which we should actually think about in the future. >>> >>> So the question is - how do we deal with that? The solutions I can think >>> of are >>> * Get some self-host runners and increase our CI capability limited by >>> ASF policy >>> * Optimize our CIs and tests so it takes less time to run >>> * Reduce the coverage of our tests so we can at least test all branches >>> >>> Any idea is welcome. >>> >>> Tian >>> >>
