thinkharderdev opened a new issue, #585: URL: https://github.com/apache/arrow-ballista/issues/585
**Describe the bug** A clear and concise description of what the bug is. In scenarios where multiple schedulers are running concurrently it is possible to run into the following scenario: 1. Job A gets submitted to scheduler A and is scheduled on all available task slots. 2. Job B gets submitted to scheduler B and there are no available task slots for scheduling. 3. All task updates from Job A go back to scheduler A. It can not schedule any tasks for Job B (because that job is owned by scheduler B) 4. Because no task updates land on scheduler B, Job B will never be scheduled anywhere. **To Reproduce** Steps to reproduce the behavior: 1. Start a cluster with two schedulers 2. Submit a job to scheduler 1 that consumes all available executor slots 3. Before any task on job 1 complete, submit a job to scheduler 2 4. Job 2 will never run **Expected behavior** A clear and concise description of what you expected to happen. Job 2 should start running whenever executor task slots become available **Additional context** Add any other context about the problem here. The fix here is simple. In the event loop, if a job is submitted and there are not task slots available, resubmit the job to the event loop (with a small delay to prevent excessive CPU consumption). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
