aneesh-joseph commented on pull request #9973: URL: https://github.com/apache/airflow/pull/9973#issuecomment-664976248
> > LGTM, but I am wondering how much performance difference there is after changing the `models` code regarding the sql alchemy queries you had to modify. > > The change on [airflow/models/dag.py](https://github.com/apache/airflow/pull/9973/files#diff-e5cbc8f771ec50ccb79ad8505f6f5697) is necessary, as it's moving the rendering decision from a hard-coded Postgres syntax `is_paused IS True` to a variable dialect syntax picked by SQLAlchemy (eg. `is_paused IS True` for Postgres, `is_paused = 1` for SQL Server). > > The changes on [airflow/models/dagcode.py](https://github.com/apache/airflow/pull/9973/files#diff-6ea4fda78947504767afdf3e397c135c) and [airflow/models/serialized_dag.py](https://github.com/apache/airflow/pull/9973/files#diff-40e3d1f11ed05554009ad08fa603a1e8) look like they could incur a performance penalty as the query optimizer is not aware that only the first result will be read. Does SQL Server not support the original exists query? it does support exists, but doesn't support EXISTS expression to be present in the columns clause of a SELECT. Failing airflow tests with mssql and exists checks - https://github.com/aneesh-joseph/airflow-tests/pull/15/checks?check_run_id=918457634 so I had to do the change in this PR or change it to something like: ``` from sqlalchemy.sql.expression import literal return = session.query(literal(True)).filter( session.query(cls) .filter(cls.dag_id == dag_id) .exists() ).scalar() ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
