DanielLeens commented on issue #10997: URL: https://github.com/apache/seatunnel/issues/10997#issuecomment-4627269639
I rechecked the scenario you described, and this still looks worth tracking as a real engine-side completion problem, but not the same path as #10836. The important difference is the one you already called out: #10836 fixes the master-failover case where completion bookkeeping is lost after leadership takeover. In your report, there was no master switch, so that fix alone does not explain a batch job staying in `RUNNING` after the bounded reader has already finished. At this point, the most likely direction is the master-side finalization / bookkeeping path under sustained high-frequency submission pressure, rather than the bounded source itself. The mitigation suggested by @dybyte (`job-metrics-partition-count` much larger than `4`) is also worth testing, because your current cluster shape is heavily skewed toward many small concurrent jobs. To narrow this down further, could you please add these three pieces of information: 1. whether the same workload still reproduces on the latest `dev` branch build 2. the master log around the exact time the job should transition from `RUNNING` to `FINISHED` (especially `JobMaster` / finalization / close-related lines) 3. whether increasing `job-metrics-partition-count` to a much larger value changes the reproduction rate If you can share those, we can separate "capacity / contention side effect" from a deeper completion-state bug much more confidently. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
