davidzollo commented on issue #10995: URL: https://github.com/apache/seatunnel/issues/10995#issuecomment-4601602869
This is a good proposal. One of the biggest problems we are currently facing is the CI issue. Looking at the durations listed, the bottleneck is a small number of heavy modules (Elasticsearch, Hbase, Clickhouse), not the long tail. Splitting every module into its own job would actually hurt us: Runner contention: with a limited pool of concurrent runners, too many fine-grained jobs just queue up, so wall-clock time may not improve. Maintenance overhead: every new job means another entry to keep in sync across backend.yml and update_modules_check.py — exactly the kind of config-error risk this issue wants to reduce. It defeats the "standard" goal: we want a reusable rule, not an ever-growing list of special cases. Proposed tiered approach instead: Tier A — standalone job: any module that consistently runs > 60 min (e.g. Elasticsearch). One job per module here. Tier B — grouped jobs: modules in the 30–60 min range (Hbase, Clickhouse, Mongodb, CDC MySQL, Http) bundled into 2–3 balanced groups, targeting < 90 min per group. Tier C — aggregated: everything else stays in the shared all-connectors-it style job. On the open questions: Thresholds: +1 on > 60 min per module / > 2.5h per parent job as the split trigger; I'd add a "re-balance" trigger when any group drifts past ~90 min. Checklist: yes, a short PR-template checklist (copy template job → adjust module names → update both backend.yml and update_modules_check.py → verify no module is double-counted or dropped) would lower the barrier and let regular contributors do it safely, with a CI maintainer review. Responsibility: with a documented checklist + the existing update_modules_check.py guard, this becomes safe enough for contributors, reviewed by someone familiar with CI. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
