davidzollo commented on issue #10995:
URL: https://github.com/apache/seatunnel/issues/10995#issuecomment-4601602869

   This is a good proposal. One of the biggest problems we are currently facing 
is the CI issue.
   
   Looking at the durations listed, the bottleneck is a small number of heavy 
modules (Elasticsearch, Hbase, Clickhouse), not the long tail. Splitting every 
module into its own job would actually hurt us:
   
   Runner contention: with a limited pool of concurrent runners, too many 
fine-grained jobs just queue up, so wall-clock time may not improve.
   Maintenance overhead: every new job means another entry to keep in sync 
across backend.yml and update_modules_check.py — exactly the kind of 
config-error risk this issue wants to reduce.
   It defeats the "standard" goal: we want a reusable rule, not an ever-growing 
list of special cases.
   
   Proposed tiered approach instead:​
   
   Tier A — standalone job: any module that consistently runs > 60 min (e.g. 
Elasticsearch). One job per module here.
   Tier B — grouped jobs: modules in the 30–60 min range (Hbase, Clickhouse, 
Mongodb, CDC MySQL, Http) bundled into 2–3 balanced groups, targeting < 90 min 
per group.
   Tier C — aggregated: everything else stays in the shared all-connectors-it 
style job.
   
   On the open questions:
   
   Thresholds: +1 on > 60 min per module / > 2.5h per parent job as the split 
trigger; I'd add a "re-balance" trigger when any group drifts past ~90 min.
   
   Checklist: yes, a short PR-template checklist (copy template job → adjust 
module names → update both backend.yml and update_modules_check.py → verify no 
module is double-counted or dropped) would lower the barrier and let regular 
contributors do it safely, with a CI maintainer review.
   
   Responsibility: with a documented checklist + the existing 
update_modules_check.py guard, this becomes safe enough for contributors, 
reviewed by someone familiar with CI.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to