GitHub user shaealh added a comment to the discussion: Airflow task failed but spark kube app is running
@rcrchawla Yup although tasks don't run slower, reducing concurrency can increase waiting time before tasks start if the queue is full. Your current capacity is 32 parallel slots (2 workers x 16) If we move to 10, capacity becomes 20 slots. Given current failures, I'd do that to test how it behaves with fewer heartbeat/ API failures, then tune throughput. For example, I might try a lower number first like 8 and monitor if that fixes the issue. If it does then we've confirmed the root cause. Next I'd monitor jobs and run durations for a couple of days. If the delays are high that's when I'd play with the parameters to see what's ideal for this specific use case (eg. adding a working and gradually creasing the concurrency) GitHub link: https://github.com/apache/airflow/discussions/63298#discussioncomment-16095553 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
