DanielLeens commented on issue #10997:
URL: https://github.com/apache/seatunnel/issues/10997#issuecomment-4627269639

   I rechecked the scenario you described, and this still looks worth tracking 
as a real engine-side completion problem, but not the same path as #10836.
   
   The important difference is the one you already called out: #10836 fixes the 
master-failover case where completion bookkeeping is lost after leadership 
takeover. In your report, there was no master switch, so that fix alone does 
not explain a batch job staying in `RUNNING` after the bounded reader has 
already finished.
   
   At this point, the most likely direction is the master-side finalization / 
bookkeeping path under sustained high-frequency submission pressure, rather 
than the bounded source itself. The mitigation suggested by @dybyte 
(`job-metrics-partition-count` much larger than `4`) is also worth testing, 
because your current cluster shape is heavily skewed toward many small 
concurrent jobs.
   
   To narrow this down further, could you please add these three pieces of 
information:
   1. whether the same workload still reproduces on the latest `dev` branch 
build
   2. the master log around the exact time the job should transition from 
`RUNNING` to `FINISHED` (especially `JobMaster` / finalization / close-related 
lines)
   3. whether increasing `job-metrics-partition-count` to a much larger value 
changes the reproduction rate
   
   If you can share those, we can separate "capacity / contention side effect" 
from a deeper completion-state bug much more confidently.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to