As a first step, we are merging this PR. We have approvals for https://github.com/apache/iceberg/pull/16945 CI: Use one Java version for PR checks
Let us know if you have any comments for this. On Wed, Jun 24, 2026 at 2:44 PM Ajantha Bhat <[email protected]> wrote: > Hi all, > I have now created multiple small PRs for the easy and big wins. > > - https://github.com/apache/iceberg/pull/16945 CI: Use one Java > version for PR checks > - https://github.com/apache/iceberg/pull/16946 CI: Select Spark PR > matrix by changed version > - https://github.com/apache/iceberg/pull/16947 CI: Select Flink PR > matrix by changed version > > Please take a look. I will have two or three more follow up PRs after this > to handle full ci flag and other module incremental builds from the > original PR: https://github.com/apache/iceberg/pull/16566 > > - Ajantha > > On Fri, Jun 19, 2026 at 9:12 PM Manu Zhang <[email protected]> > wrote: > >> Ajantha, sorry I missed your early email. It will be great to split your >> PR and get the enhancements for Spark CI or Flink CI in first. >> >> Kevin, that's good news! >> >>> CI should generally run by default for relevant changes, with explicit >>> opt-outs where appropriate. >> >> Agreed. I believe there are still low hanging fruits we can pick based on >> Ajantha and others' PRs. >> >> Thanks, >> Manu >> >> On Fri, Jun 19, 2026 at 2:17 AM Kevin Liu <[email protected]> wrote: >> >>> Thanks everyone for all the contributions to reduce CI resource usage. >>> I've seen a number of improvements go in already. I just checked the >>> apache dashboard, it looks like we're still under the ceiling set by ASF, >>> for both 5 day and 7 day periods. >>> >>> There's definitely more room for improvement. But I think we should >>> prioritize correctness and coverage. I would also like to focus on >>> maintainability and avoid patterns that require ongoing manual >>> maintenance to opt changes into CI, since those can quietly reduce coverage >>> over time. CI should generally run by default for relevant changes, >>> with explicit opt-outs where appropriate. >>> >>> Regarding the other repos, I pulled the github action usage data for >>> the past 7 days: >>> Repository Workflow runs Jobs Runner minutes % of total >>> apache/iceberg 3,574 14,909 177,594.8 77.45% >>> apache/iceberg-cpp 1,455 2,960 26,888.5 11.73% >>> apache/iceberg-rust 1,078 3,416 18,196.7 7.94% >>> apache/iceberg-python 594 1,445 3,387.4 1.48% >>> apache/iceberg-go 633 1,188 3,154.1 1.38% >>> apache/terraform-provider-iceberg 42 48 71.0 0.03% >>> *Total* *7,376* *23,966* *229,292.5* *100.00%* >>> >>> Looks like java repo is still the top contributor :) >>> >>> Best, >>> Kevin Liu >>> >>> On Thu, Jun 18, 2026 at 6:39 AM Ajantha Bhat <[email protected]> >>> wrote: >>> >>>> Hi Manu, all of these were handled in the parent PR I mentioned three >>>> weeks ago. >>>> Can we all please review this? >>>> https://github.com/apache/iceberg/pull/16566 >>>> >>>> I can split into smaller PRs if required. >>>> >>>> On Thu, Jun 18, 2026 at 1:59 PM Manu Zhang <[email protected]> >>>> wrote: >>>> >>>>> Hi all, >>>>> >>>>> Here's another quick win from scoping Spark CI to only changed Spark >>>>> versions [1]. We usually open a PR first against the latest Spark version >>>>> and then back-port it to previous versions after the merge. Running Spark >>>>> CI for all Spark versions in such cases wastes resources. >>>>> >>>>> If this approach is approved, I can also make a PR for Flink CI. >>>>> >>>>> >>>>> 1. https://github.com/apache/iceberg/pull/16800 >>>>> >>>>> Thanks, >>>>> Manu >>>>> >>>>> On Sat, Jun 13, 2026 at 8:34 AM Abnob Doss <[email protected]> >>>>> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> A potential small win from the subproject side: the iceberg-rust >>>>>> Python bindings CI had ended up building the Rust bindings twice per run, >>>>>> due to an accidental interaction between a few changes over time. >>>>>> One-line >>>>>> fix: >>>>>> https://github.com/apache/iceberg-rust/pull/2636 >>>>>> >>>>>> Measured over the past 7 days, the duplicate build took a median of >>>>>> 8.4 min on Linux, 12.1 min on macOS, and 15.3 min on Windows, totaling >>>>>> about 2,400 runner-minutes across 207 job executions. After the fix the >>>>>> same step takes a few seconds. >>>>>> >>>>>> Thanks, >>>>>> Abanoub >>>>>> >>>>>> On Wednesday, June 3rd, 2026 at 9:49 AM, Bob Thomson < >>>>>> [email protected]> wrote: >>>>>> >>>>>> > I don't think we have data to that level of granularity, it's a >>>>>> case of looking at the Actions and their run time and frequency of >>>>>> execution in each of your repos, and focussing on the longest running and >>>>>> most frequent ones. That is, an Action run might only run for 5 minutes >>>>>> each time, but if it is running 400 times a day then that occupies more >>>>>> than one job slot of the toal of 900 ASF has, for the duration of that >>>>>> day. >>>>>> > Experience so far suggests those actions that build Java are often >>>>>> the most time consuming. >>>>>> > >>>>>> > Thanks. >>>>>> > >>>>>> > Kind regards, >>>>>> > -Bob Thomson. >>>>>> > >>>>>> > On 2026/06/01 18:39:38 Yufei Gu wrote: >>>>>> > > Hi Bob, >>>>>> > > >>>>>> > > Thanks for the heads-up and for giving the Iceberg community time >>>>>> to work >>>>>> > > on this. >>>>>> > > >>>>>> > > One question: Is the concern based on the overall GitHub Actions >>>>>> > > consumption of the Iceberg projects(e.g., main repo, python repo, >>>>>> go repo, >>>>>> > > etc), or only for the main Iceberg repository? Iceberg has >>>>>> multiple >>>>>> > > repositories, including the main repository as well as Python, >>>>>> Go, Rust, >>>>>> > > and C++ subprojects. Most of the discussion and optimization work >>>>>> in this >>>>>> > > thread focuses on the main repository, where the majority of CI >>>>>> usage >>>>>> > > occurs. If the overall project usage is within acceptable limits, >>>>>> would it >>>>>> > > be possible to allow a higher quota for a single repo (the >>>>>> Iceberg main >>>>>> > > repository), given its broader compatibility and integration >>>>>> testing >>>>>> > > requirements? >>>>>> > > >>>>>> > > Yufei >>>>>> > > >>>>>> > > >>>>>> > > On Mon, Jun 1, 2026 at 11:00 AM Steve Loughran < >>>>>> [email protected]> wrote: >>>>>> > > >>>>>> > > > This is really good for draft builds. >>>>>> > > > >>>>>> > > > If I'm committing and pushing work up to a WiP PR, it is often >>>>>> because I >>>>>> > > > want *a* machine to do the testing; I don't care who it runs as. >>>>>> > > > >>>>>> > > > Forcing PRs to run as the submitter also hardens the OSS repo >>>>>> against >>>>>> > > > vulnerabilities in the Github Actions and other parts of the >>>>>> build process. >>>>>> > > > >>>>>> > > > On Mon, 1 Jun 2026 at 17:11, Prashant Singh < >>>>>> [email protected]> >>>>>> > > > wrote: >>>>>> > > > >>>>>> > > >> Hi all, >>>>>> > > >> >>>>>> > > >> Great progress on the matrix reduction, incremental builds, >>>>>> and draft PR >>>>>> > > >> skipping ideas. I'd like to propose a complementary approach >>>>>> that can >>>>>> > > >> work >>>>>> > > >> alongside all of those: running PR CI on contributor fork >>>>>> compute >>>>>> > > >> instead >>>>>> > > >> of the ASF shared pool. >>>>>> > > >> >>>>>> > > >> How it works: >>>>>> > > >> >>>>>> > > >> Workflows switch from pull_request to push triggers on >>>>>> non-main >>>>>> > > >> branches. Each workflow: >>>>>> > > >> >>>>>> > > >> 1. Checks out apache/iceberg main (security boundary — >>>>>> untrusted code >>>>>> > > >> can't modify the workflow itself) >>>>>> > > >> 2. Squash-merges the contributor's fork branch on top >>>>>> > > >> 3. Runs tests on that merged tree >>>>>> > > >> >>>>>> > > >> Because the push event fires on the fork, GitHub bills the >>>>>> CI minutes >>>>>> > > >> to the fork owner's account - not the ASF shared pool. This >>>>>> takes >>>>>> > > >> Iceberg's PR CI usage from the ASF runners to effectively >>>>>> zero, >>>>>> > > >> regardless of matrix size. >>>>>> > > >> >>>>>> > > >> Why this is complementary: >>>>>> > > >> >>>>>> > > >> The optimizations discussed so far all reduce how much CI >>>>>> runs. >>>>>> > > >> Fork-compute changes where >>>>>> > > >> it runs. They compose - a leaner matrix running on fork >>>>>> compute is >>>>>> > > >> strictly better than either approach alone. >>>>>> > > >> >>>>>> > > >> Inline PR status: >>>>>> > > >> >>>>>> > > >> A lightweight notify_test_workflow.yml (using >>>>>> pull_request_target + >>>>>> > > >> Checks API) is included to post fork CI results directly >>>>>> onto the >>>>>> > > >> upstream PR's checks tab - so reviewers see green/red status >>>>>> inline as >>>>>> > > >> they do today. >>>>>> > > >> >>>>>> > > >> *Prior art*: >>>>>> > > >> >>>>>> > > >> Apache Spark adopted this pattern in 2024 (SPARK-47041) and >>>>>> has been >>>>>> > > >> running it in production since. Their full Spark CI matrix >>>>>> runs entirely >>>>>> > > >> on contributor forks. >>>>>> > > >> >>>>>> > > >> PR: https://github.com/apache/iceberg/pull/15397: covers >>>>>> all 10 >>>>>> > > >> workflow files. I've verified all workflows pass on fork >>>>>> computation. >>>>>> > > >> >>>>>> > > >> This could be merged independently of the matrix/incremental >>>>>> > > >> optimizations and would immediately eliminate PR CI pressure >>>>>> on the >>>>>> > > >> ASF pool - well within the June 8 deadline. >>>>>> > > >> >>>>>> > > >> Thoughts? >>>>>> > > >> >>>>>> > > >> Prashant Singh >>>>>> > > >> >>>>>> > > >> On Fri, May 29, 2026 at 8:47 PM Renjie Liu < >>>>>> [email protected]> >>>>>> > > >> wrote: >>>>>> > > >> >>>>>> > > >>> I like the idea of cutting supported jvm runs in each ci. JVM >>>>>> has great >>>>>> > > >>> backward compatibility, and we run on one jvm (maybe jvm 17) >>>>>> and trigger a >>>>>> > > >>> nightly run for jvm 21. >>>>>> > > >>> >>>>>> > > >>> On Wed, May 27, 2026 at 3:17 AM Steve Loughran < >>>>>> [email protected]> >>>>>> > > >>> wrote: >>>>>> > > >>> >>>>>> > > >>>> >>>>>> > > >>>> Doing a scan of the aws-sdk bundle.jar is halfway to an >>>>>> audit of the >>>>>> > > >>>> maven repo, with spark the other half. >>>>>> > > >>>> >>>>>> > > >>>> It seems to me that only PRs which go near >>>>>> gradle/libs.versions.toml >>>>>> > > >>>> are going to change dependences, so introduce new CVEs. >>>>>> > > >>>> >>>>>> > > >>>> There's the separate issue "CVEs are eternal" and all >>>>>> existing >>>>>> > > >>>> dependencies are collections of undiscovered/unreported >>>>>> cves. That's >>>>>> > > >>>> dependabot's homework, generally. >>>>>> > > >>>> >>>>>> > > >>>> >>>>>> > > >>>> On Tue, 26 May 2026 at 19:49, Kevin Liu < >>>>>> [email protected]> wrote: >>>>>> > > >>>> >>>>>> > > >>>>> Thanks everyone for the great ideas. >>>>>> > > >>>>> >>>>>> > > >>>>> Here's where we stand today with respect to ASF runner >>>>>> usage (taken >>>>>> > > >>>>> from the link [2] above): >>>>>> > > >>>>> GitHub Actions Build Time Used >>>>>> > > >>>>> - past 7 days total usage: 218,321 minutes >>>>>> > > >>>>> - past 5 days total usage: 120,241 minutes >>>>>> > > >>>>> >>>>>> > > >>>>> *This puts us below the hard ceiling for resource usage* as >>>>>> described >>>>>> > > >>>>> by https://infra.apache.org/github-actions-policy.html >>>>>> > > >>>>> >>>>>> > > >>>>> > The average number of minutes a project uses *per >>>>>> calendar week >>>>>> > > >>>>> MUST NOT exceed the equivalent of 25 full-time runners >>>>>> (250,000 minutes, or >>>>>> > > >>>>> 4,200 hours)*. >>>>>> > > >>>>> > The average number of minutes a project uses *in any >>>>>> consecutive >>>>>> > > >>>>> five-day period MUST NOT exceed the equivalent of 30 >>>>>> full-time runners >>>>>> > > >>>>> (216,000 minutes, or 3,600 hours)*. >>>>>> > > >>>>> >>>>>> > > >>>>> We should still make improvements wherever possible. >>>>>> > > >>>>> >>>>>> > > >>>>> I have a few PRs to reduce CI usage further. >>>>>> > > >>>>> - CI: Limit CVE scan runs to relevant changes #16513 >>>>>> > > >>>>> - Build: Simplify CI workflow path filters to avoid >>>>>> per-workflow >>>>>> > > >>>>> maintenance #16302 >>>>>> > > >>>>> >>>>>> > > >>>>> There are a couple of heuristics we can use >>>>>> > > >>>>> 1. Don't run CI if not needed. For example, `site/` dir >>>>>> changes >>>>>> > > >>>>> shouldn't trigger Spark/Flink/Java CI. This might be >>>>>> optimized already, but >>>>>> > > >>>>> we should double check just in case. >>>>>> > > >>>>> 2. If we must run CI, fail fast. For example, if there is a >>>>>> formatter >>>>>> > > >>>>> issue, fail all inflight CI tasks. >>>>>> > > >>>>> 3. Within a specific CI workflow, reduce the matrix >>>>>> wherever possible. >>>>>> > > >>>>> Do we really need to run all "Java versions" x "Scala >>>>>> versions" x "Spark >>>>>> > > >>>>> versions"? >>>>>> > > >>>>> 4. Improve individual CI tasks. Spark CI dominates 57% of >>>>>> all resource >>>>>> > > >>>>> usage. I have a tracking issue where I benchmarked where >>>>>> all that time is >>>>>> > > >>>>> spent. See https://github.com/apache/iceberg/issues/16397 >>>>>> > > >>>>> >>>>>> > > >>>>> Top CI tasks as % of resource use: >>>>>> > > >>>>> - Spark CI: 57.68% >>>>>> > > >>>>> - Flink CI: 13.60% >>>>>> > > >>>>> - Java CI: 7.02% >>>>>> > > >>>>> - CVE Scan: 3.13% >>>>>> > > >>>>> >>>>>> > > >>>>> Best, >>>>>> > > >>>>> Kevin Liu >>>>>> > > >>>>> >>>>>> > > >>>>> On Tue, May 26, 2026 at 5:35 AM Ajantha Bhat < >>>>>> [email protected]> >>>>>> > > >>>>> wrote: >>>>>> > > >>>>> >>>>>> > > >>>>>> Hi all, >>>>>> > > >>>>>> >>>>>> > > >>>>>> How about implementing the incremental PR builder? >>>>>> (similar to >>>>>> > > >>>>>> >>>>>> https://github.com/gitflow-incremental-builder/gitflow-incremental-builder >>>>>> > > >>>>>> ) >>>>>> > > >>>>>> >>>>>> > > >>>>>> I think one of the main causes of GitHub runner pressure >>>>>> in Iceberg >>>>>> > > >>>>>> is the breadth of our CI matrix. We support multiple >>>>>> languages (java, >>>>>> > > >>>>>> python, go, rust, cpp) and integrations, and for Java we >>>>>> test across >>>>>> > > >>>>>> multiple JVM versions, Spark versions, Flink versions, >>>>>> Kafka, Hive/MR, >>>>>> > > >>>>>> REST/OpenAPI, runtime bundles, and more. That coverage is >>>>>> valuable, but >>>>>> > > >>>>>> running most of it for every PR is expensive and increases >>>>>> both runner >>>>>> > > >>>>>> usage and CI wall time. >>>>>> > > >>>>>> >>>>>> > > >>>>>> I think the biggest win can be achieved by having an >>>>>> incremental PR >>>>>> > > >>>>>> build. >>>>>> > > >>>>>> We already have useful building blocks for it: Gradle >>>>>> build cache, >>>>>> > > >>>>>> path filters, and version-selective build properties like >>>>>> -DsparkVersions >>>>>> > > >>>>>> and -DflinkVersions. >>>>>> > > >>>>>> >>>>>> > > >>>>>> The idea is to keep full coverage on main, release >>>>>> branches, tags, >>>>>> > > >>>>>> and global build changes, but make PR CI depend on the >>>>>> files changed: >>>>>> > > >>>>>> >>>>>> > > >>>>>> - Spark-only changes run Spark CI, not Flink/Hive/Kafka. >>>>>> > > >>>>>> - spark/v4.1/** changes run only Spark 4.1, not every >>>>>> Spark >>>>>> > > >>>>>> version. >>>>>> > > >>>>>> - flink/v2.0/** changes run only Flink 2.0, not every >>>>>> Flink >>>>>> > > >>>>>> version. >>>>>> > > >>>>>> - API/Core/Data/File format changes run the owning Java >>>>>> checks >>>>>> > > >>>>>> plus selected downstream canaries, such as latest Spark >>>>>> and latest Flink, >>>>>> > > >>>>>> instead of the full engine matrix. >>>>>> > > >>>>>> - Runtime/bundle CVE checks run only for affected >>>>>> runtime >>>>>> > > >>>>>> artifacts. >>>>>> > > >>>>>> - A full-ci label or global Gradle/workflow changes can >>>>>> still >>>>>> > > >>>>>> force the full matrix. >>>>>> > > >>>>>> >>>>>> > > >>>>>> >>>>>> > > >>>>>> Another possible optimization is JVM coverage. Today many >>>>>> PR jobs run >>>>>> > > >>>>>> across both Java 17 and Java 21. We could consider running >>>>>> one primary JVM >>>>>> > > >>>>>> for PRs, and reserve the full JVM matrix for main, release >>>>>> branches, >>>>>> > > >>>>>> nightly/scheduled builds, or PRs labeled full-ci. That >>>>>> would further reduce >>>>>> > > >>>>>> runner usage and PR wall time, while still preserving >>>>>> broad compatibility >>>>>> > > >>>>>> coverage before changes become part of the main branch. >>>>>> > > >>>>>> >>>>>> > > >>>>>> A practical approach could be: >>>>>> > > >>>>>> >>>>>> > > >>>>>> PRs: incremental module/version selection, mostly one JVM, >>>>>> plus >>>>>> > > >>>>>> targeted canaries. >>>>>> > > >>>>>> main: full matrix across JVMs, Spark versions, Flink >>>>>> versions, and >>>>>> > > >>>>>> runtime checks. >>>>>> > > >>>>>> Manual override: full-ci label for risky or cross-cutting >>>>>> PRs. >>>>>> > > >>>>>> >>>>>> > > >>>>>> This should reduce queue time, lower GitHub runner >>>>>> consumption, and >>>>>> > > >>>>>> give contributors faster feedback without giving up full >>>>>> coverage where it >>>>>> > > >>>>>> matters most. >>>>>> > > >>>>>> >>>>>> > > >>>>>> I am working on a POC >>>>>> https://github.com/apache/iceberg/pull/16566 >>>>>> > > >>>>>> Suggestions are welcome. >>>>>> > > >>>>>> >>>>>> > > >>>>>> - Ajantha >>>>>> > > >>>>>> >>>>>> > > >>>>>> On Mon, May 25, 2026 at 7:35 PM Junwang Zhao < >>>>>> [email protected]> >>>>>> > > >>>>>> wrote: >>>>>> > > >>>>>> >>>>>> > > >>>>>>> Hi Manu, >>>>>> > > >>>>>>> >>>>>> > > >>>>>>> On Mon, May 25, 2026 at 9:33 PM Manu Zhang < >>>>>> [email protected]> >>>>>> > > >>>>>>> wrote: >>>>>> > > >>>>>>> > >>>>>> > > >>>>>>> > Hi Junwang, >>>>>> > > >>>>>>> > >>>>>> > > >>>>>>> > Not sure about others but I usually only change status >>>>>> to "Ready >>>>>> > > >>>>>>> for review" when CI has passed. >>>>>> > > >>>>>>> >>>>>> > > >>>>>>> Yeah, I agree there are trade-offs to disabling gh >>>>>> actions for draft >>>>>> > > >>>>>>> PRs. >>>>>> > > >>>>>>> >>>>>> > > >>>>>>> Reasons to Disable: >>>>>> > > >>>>>>> >>>>>> > > >>>>>>> - Cost savings: large teams and monorepos can burn >>>>>> through GitHub >>>>>> > > >>>>>>> Actions minutes quickly. Skipping CI for draft PRs avoids >>>>>> spending >>>>>> > > >>>>>>> resources on code that may not even compile yet. >>>>>> > > >>>>>>> - Reduced noise: draft PRs are often used for >>>>>> experimentation or >>>>>> > > >>>>>>> work-in-progress changes. Disabling CI avoids cluttering >>>>>> the PR >>>>>> > > >>>>>>> timeline with transient failures while the author is >>>>>> still iterating. >>>>>> > > >>>>>>> - Better resource utilization: orgs with limited >>>>>> self-hosted runners >>>>>> > > >>>>>>> may prefer to prioritize "Ready for Review" PRs so >>>>>> > > >>>>>>> production-relevant >>>>>> > > >>>>>>> changes get feedback and merge capacity sooner. >>>>>> > > >>>>>>> >>>>>> > > >>>>>>> Reasons to Keep: >>>>>> > > >>>>>>> >>>>>> > > >>>>>>> - Early error detection: developers can use draft PRs as >>>>>> a sandbox to >>>>>> > > >>>>>>> validate builds and tests before requesting review. >>>>>> > > >>>>>>> - Self-correction: failed checks on a draft PR allow >>>>>> authors to fix >>>>>> > > >>>>>>> lint or test issues before involving reviewers. >>>>>> > > >>>>>>> - Higher review confidence: by the time a PR is marked >>>>>> "Ready for >>>>>> > > >>>>>>> Review", CI has often already passed at least once, >>>>>> leading to a >>>>>> > > >>>>>>> smoother review process. >>>>>> > > >>>>>>> >>>>>> > > >>>>>>> For myself, when I create a draft PR, I'm usually sharing >>>>>> early >>>>>> > > >>>>>>> work-in-progress code with other developers and may not >>>>>> have tested >>>>>> > > >>>>>>> it >>>>>> > > >>>>>>> thoroughly locally yet, so I sometimes prefer to disable >>>>>> CI. That's >>>>>> > > >>>>>>> just my personal preference though. >>>>>> > > >>>>>>> >>>>>> > > >>>>>>> > >>>>>> > > >>>>>>> > Regards, >>>>>> > > >>>>>>> > Manu >>>>>> > > >>>>>>> > >>>>>> > > >>>>>>> > On Mon, May 25, 2026 at 3:21 PM Junwang Zhao < >>>>>> [email protected]> >>>>>> > > >>>>>>> wrote: >>>>>> > > >>>>>>> >> >>>>>> > > >>>>>>> >> On Mon, May 25, 2026 at 11:20 AM Junwang Zhao < >>>>>> [email protected]> >>>>>> > > >>>>>>> wrote: >>>>>> > > >>>>>>> >> > >>>>>> > > >>>>>>> >> > On Sun, May 24, 2026 at 12:13 PM Steven Wu < >>>>>> > > >>>>>>> [email protected]> wrote: >>>>>> > > >>>>>>> >> > > >>>>>> > > >>>>>>> >> > > Kevin's PR of removing Spark 3.4 was merged a few >>>>>> days ago. >>>>>> > > >>>>>>> It should reduce the Spark CI cost by ~25%. >>>>>> > > >>>>>>> >> > > >>>>>> > > >>>>>>> >> > > Some heavy-hitter test classes in Spark tests >>>>>> (core and >>>>>> > > >>>>>>> extension) cause high load due to parameter combinations. >>>>>> I asked AI to >>>>>> > > >>>>>>> analyze the build log and recommend changes offering the >>>>>> best ROI. Details >>>>>> > > >>>>>>> are in this doc. >>>>>> > > >>>>>>> >> > > >>>>>> > > >>>>>>> >> > > I can look into dropping some combinations without >>>>>> > > >>>>>>> sacrificing essential coverage. E.g., we can probably >>>>>> drop the Hadoop >>>>>> > > >>>>>>> catalog usage in test, as it wasn't recommended for >>>>>> production use anyway. >>>>>> > > >>>>>>> >> > >>>>>> > > >>>>>>> >> > iceberg-cpp skips Actions for draft PRs [1] to >>>>>> reduce CI >>>>>> > > >>>>>>> resource >>>>>> > > >>>>>>> >> > usage a little bit. Perhaps we should apply the same >>>>>> approach >>>>>> > > >>>>>>> across >>>>>> > > >>>>>>> >> > all iceberg subprojects? >>>>>> > > >>>>>>> >> > >>>>>> > > >>>>>>> >> > [1] https://github.com/apache/iceberg-cpp/pull/680 >>>>>> > > >>>>>>> >> >>>>>> > > >>>>>>> >> I've created a PR to show that, see [1], since it's a >>>>>> draft, the >>>>>> > > >>>>>>> CI >>>>>> > > >>>>>>> >> won't run. If I click the `Ready for review` button, >>>>>> the actions >>>>>> > > >>>>>>> will >>>>>> > > >>>>>>> >> be triggered. Let me know what you think about it. >>>>>> > > >>>>>>> >> >>>>>> > > >>>>>>> >> [1] https://github.com/apache/iceberg/pull/16561 >>>>>> > > >>>>>>> >> >>>>>> > > >>>>>>> >> > >>>>>> > > >>>>>>> >> > > >>>>>> > > >>>>>>> >> > > >>>>>> > > >>>>>>> >> > > >>>>>> > > >>>>>>> >> > > On Fri, May 22, 2026 at 8:22 AM Matt Butrovich < >>>>>> > > >>>>>>> [email protected]> wrote: >>>>>> > > >>>>>>> >> > >> >>>>>> > > >>>>>>> >> > >> Apache DataFusion similarly received this notice. >>>>>> For >>>>>> > > >>>>>>> visibility to the Iceberg community, we have tracking >>>>>> issues to try to >>>>>> > > >>>>>>> discuss solutions: >>>>>> > > >>>>>>> >> > >> >>>>>> > > >>>>>>> >> > >> https://github.com/apache/datafusion/issues/22455 >>>>>> > > >>>>>>> >> > >> >>>>>> https://github.com/apache/datafusion-comet/issues/4406 >>>>>> > > >>>>>>> >> > >> >>>>>> > > >>>>>>> >> > >> DataFusion Comet is consuming the vast majority of >>>>>> > > >>>>>>> DataFusion resources, and like the Iceberg project it's >>>>>> due to Spark tests >>>>>> > > >>>>>>> (and Iceberg's Spark tests). We are doing some analysis >>>>>> on what subsets >>>>>> > > >>>>>>> might be appropriate for our workflows, features, and >>>>>> goals, and will share >>>>>> > > >>>>>>> anything that we think might translate back to the >>>>>> Iceberg CI workflows. >>>>>> > > >>>>>>> >> > >> >>>>>> > > >>>>>>> >> > >> On Fri, May 22, 2026 at 7:43 AM Robert Thomson < >>>>>> > > >>>>>>> [email protected]> wrote: >>>>>> > > >>>>>>> >> > >>> >>>>>> > > >>>>>>> >> > >>> Hello, Iceberg PMC. >>>>>> > > >>>>>>> >> > >>> >>>>>> > > >>>>>>> >> > >>> In 2024, the ASF introduced the policy for >>>>>> GitHub Actions >>>>>> > > >>>>>>> usage >>>>>> > > >>>>>>> >> > >>> across the foundation[1]. The ASF Github shared >>>>>> pool of >>>>>> > > >>>>>>> >> > >>> Github-hosted runners has been at, or very close >>>>>> to the >>>>>> > > >>>>>>> limit of >>>>>> > > >>>>>>> >> > >>> 900 jobs most of the time in the past few weeks >>>>>> and this is >>>>>> > > >>>>>>> the >>>>>> > > >>>>>>> >> > >>> case again today. >>>>>> > > >>>>>>> >> > >>> >>>>>> > > >>>>>>> >> > >>> Your project has been identified as being among >>>>>> the top 5 >>>>>> > > >>>>>>> consumers of >>>>>> > > >>>>>>> >> > >>> build time over the past 7 days and we request >>>>>> that you >>>>>> > > >>>>>>> bring your >>>>>> > > >>>>>>> >> > >>> usage down by stream-lining long-running builds. >>>>>> Contact >>>>>> > > >>>>>>> Infra for >>>>>> > > >>>>>>> >> > >>> a consultation if you are unable to streamline >>>>>> your builds >>>>>> > > >>>>>>> further. >>>>>> > > >>>>>>> >> > >>> >>>>>> > > >>>>>>> >> > >>> You can use the infra reporting tool[2] to >>>>>> monitor your GHA >>>>>> > > >>>>>>> usage as you >>>>>> > > >>>>>>> >> > >>> work on stream-lining, as well as locate any >>>>>> bottlenecks in >>>>>> > > >>>>>>> the workflows. >>>>>> > > >>>>>>> >> > >>> >>>>>> > > >>>>>>> >> > >>> Infra will allow you two weeks time (till the >>>>>> 8th of June, >>>>>> > > >>>>>>> 2026) to >>>>>> > > >>>>>>> >> > >>> progress this, but should you still be above the >>>>>> limits by >>>>>> > > >>>>>>> then, >>>>>> > > >>>>>>> >> > >>> without a viable path forward, we will be >>>>>> limiting your GHA >>>>>> > > >>>>>>> usage. >>>>>> > > >>>>>>> >> > >>> >>>>>> > > >>>>>>> >> > >>> Kind regards, >>>>>> > > >>>>>>> >> > >>> Bob Thomson, on behalf of ASF Infrastructure. >>>>>> > > >>>>>>> >> > >>> >>>>>> > > >>>>>>> >> > >>> >>>>>> > > >>>>>>> >> > >>> [1] >>>>>> https://infra.apache.org/github-actions-policy.html >>>>>> > > >>>>>>> >> > >>> [2] >>>>>> > > >>>>>>> >>>>>> https://infra-reports.apache.org/#ghactions&project=iceberg&hours=24&limit=15&group=name >>>>>> > > >>>>>>> >> > >>> >>>>>> > > >>>>>>> >> > >>>>>> > > >>>>>>> >> > >>>>>> > > >>>>>>> >> > -- >>>>>> > > >>>>>>> >> > Regards >>>>>> > > >>>>>>> >> > Junwang Zhao >>>>>> > > >>>>>>> >> >>>>>> > > >>>>>>> >> >>>>>> > > >>>>>>> >> >>>>>> > > >>>>>>> >> -- >>>>>> > > >>>>>>> >> Regards >>>>>> > > >>>>>>> >> Junwang Zhao >>>>>> > > >>>>>>> >>>>>> > > >>>>>>> >>>>>> > > >>>>>>> >>>>>> > > >>>>>>> -- >>>>>> > > >>>>>>> Regards >>>>>> > > >>>>>>> Junwang Zhao >>>>>> > > >>>>>>> >>>>>> > > >>>>>> >>>>>> > > >>>>>> > >>>>>> >>>>>
