notatallshaw-work commented on issue #25728:
URL: https://github.com/apache/airflow/issues/25728#issuecomment-1216672889
I'm a bit confused by this part of this loop (comment removed for clarity):
```python
for _ in range(min((open_slots, len(self.queued_tasks)))):
key, (command, _, queue, ti) = sorted_queue.pop(0)
if key in self.running:
attempt = self.attempts[key]
if attempt < QUEUEING_ATTEMPTS - 1:
self.attempts[key] = attempt + 1
self.log.info("task %s is still running", key)
continue
```
There's no sleep and no external call to check the status, so if
`self.running` is being updated on another thread the only chance it really has
is when `self.log` is called, if `self.running` is being updated by an
asynchronous loop somewhere then it has to rely on the GIL giving it a chance
to update when a tight non-asynchronous bit of code it running, seems unlikely?
Reading the comments the situation it is supposed to be catching is when
"the task has been killed externally and not yet been marked as failed", why
does it not check the status of the task instead? In our case the status of the
task is "UP_FOR_RESCHEDULE" and it doesn't make sense to me that the executor
is confused about a task in that status being running or not?
@malthe @potiuk Sorry to ping you directly, but I would be happy to help
test and/or contribute if you could any hints or clarity on my questions.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]