vemikhaylov commented on a change in pull request #14500:
URL: https://github.com/apache/airflow/pull/14500#discussion_r584191749
##########
File path: airflow/models/dag.py
##########
@@ -1227,6 +1230,8 @@ def clear(
tis = tis.filter(or_(TI.state == State.FAILED, TI.state ==
State.UPSTREAM_FAILED))
if only_running:
tis = tis.filter(TI.state == State.RUNNING)
+ if task_ids:
+ tis = tis.filter(TI.task_id.in_(task_ids))
Review comment:
Actually the conditions are just added with conjunction:
```python
# tst_double_in_query.py
from sqlalchemy.orm import Session
from airflow.models import TaskInstance
session = Session()
query =
session.query(TaskInstance.task_id).filter(TaskInstance.task_id.in_(["foo"]))
print(str(query.statement))
query = query.filter(TaskInstance.task_id.in_(["bar"]))
print(str(query.statement))
```
```
$ python tst_double_in_query.py
First query:
SELECT task_instance.task_id
FROM task_instance
WHERE task_instance.task_id IN (:task_id_1)
Second query:
SELECT task_instance.task_id
FROM task_instance
WHERE task_instance.task_id IN (:task_id_1) AND task_instance.task_id IN
(:task_id_2)
```
So the second filter narrows down the search space if `task_ids` are
provided.
Naturally we can intersect the sets preliminary and apply the filter once,
it can make the generated SQL code a little more efficient. Would it be better,
how do you feel?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]