Copilot commented on code in PR #60133: URL: https://github.com/apache/airflow/pull/60133#discussion_r3066480954
########## scripts/ci/docker-compose/devcontainer-mariadb.yml: ########## @@ -0,0 +1,24 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +--- +services: + mariadb: + image: mariadb:11.8 + ports: + - "3306:3306" Review Comment: The MariaDB container is missing required initialization env vars (e.g. `MARIADB_ALLOW_EMPTY_ROOT_PASSWORD` / `MARIADB_ROOT_PASSWORD`), so it will fail to start with the official `mariadb` image. Also, the declared volume is never mounted into the service. Add the appropriate MariaDB env vars and mount the named volume to `/var/lib/mysql` (or remove the unused volume). ```suggestion image: mariadb:11.8 environment: MARIADB_ROOT_PASSWORD: mariadb ports: - "3306:3306" volumes: - mysql-db-volume:/var/lib/mysql ``` ########## dev/breeze/src/airflow_breeze/utils/selective_checks.py: ########## @@ -785,8 +785,12 @@ def mysql_versions(self) -> list[str]: if self.all_versions: return CURRENT_MYSQL_VERSIONS if self.latest_versions_only: - return [CURRENT_MYSQL_VERSIONS[-1]] - return [DEFAULT_MYSQL_VERSION] + # Return latest MySQL and latest MariaDB + mysql_versions = [v for v in CURRENT_MYSQL_VERSIONS if not v.startswith("mariadb:")] + mariadb_versions = [v for v in CURRENT_MYSQL_VERSIONS if v.startswith("mariadb:")] + return [mysql_versions[-1], mariadb_versions[-1]] + # Test both MySQL and MariaDB LTS by default + return [DEFAULT_MYSQL_VERSION, "mariadb:11.8"] Review Comment: Indexing `mysql_versions[-1]` / `mariadb_versions[-1]` will raise `IndexError` if either list is empty. That can happen if `CURRENT_MYSQL_VERSIONS` is changed (e.g. temporarily only MySQL or only MariaDB). Guard against empty lists and fall back to defaults. Also, the hard-coded `'mariadb:11.8'` duplicates the version defined in constants—prefer deriving the default MariaDB selection from the MariaDB constants to avoid drift. ########## scripts/ci/docker-compose/devcontainer-mariadb.yml: ########## @@ -0,0 +1,24 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +--- +services: + mariadb: + image: mariadb:11.8 Review Comment: This hard-codes the MariaDB image tag, but `devcontainer.env` introduces `MARIADB_VERSION`. Consider using `image: mariadb:${MARIADB_VERSION}` to keep version selection consistent with the rest of the dev/CI compose setup. ```suggestion image: mariadb:${MARIADB_VERSION} ``` ########## airflow-core/src/airflow/utils/sqlalchemy.py: ########## @@ -353,13 +353,8 @@ def with_row_locks( if not dialect_name: return query - # Don't use row level locks if the MySQL dialect (Mariadb & MySQL < 8) does not support it. if not USE_ROW_LEVEL_LOCKING: return query - if dialect_name == "mysql" and not getattr( - session.bind.dialect if session.bind else None, "supports_for_update_of", False - ): - return query if nowait: kwargs["nowait"] = True if skip_locked: Review Comment: After removing the MySQL/MariaDB guard, `nowait`/`skip_locked` can be requested on dialects/versions that don’t support them (e.g. older MySQL/MariaDB), which can result in SQL syntax errors at runtime. Consider checking the dialect capability flags (e.g. `supports_for_update_nowait` / `supports_for_update_skip_locked`) before setting those kwargs, and silently degrade when unsupported. ```suggestion dialect = session.get_bind().dialect if nowait and getattr(dialect, "supports_for_update_nowait", False): kwargs["nowait"] = True if skip_locked and getattr(dialect, "supports_for_update_skip_locked", False): ``` ########## dev/breeze/src/airflow_breeze/params/shell_params.py: ########## @@ -324,7 +324,8 @@ def backend_version(self) -> str: if self.backend == "postgres": version = self.postgres_version if self.backend == "mysql": - version = self.mysql_version + # Strip mariadb: prefix if present for display + version = self.mysql_version.replace("mariadb:", "") Review Comment: Using `replace()` will remove `mariadb:` anywhere in the string (not just the prefix) and also drops the engine identity from the returned `backend_version` value. If `backend_version` is used for identifiers (job names, cache keys, artifacts, etc.), MariaDB runs may become indistinguishable from a plain numeric version. Prefer `removeprefix('mariadb:')` for correctness, and consider keeping `backend_version` as the full value while adding a separate display-only property for the stripped version. ########## airflow-core/src/airflow/jobs/scheduler_job_runner.py: ########## @@ -3026,12 +3026,16 @@ def _update_asset_orphanage(self, session: Session = NEW_SESSION) -> None: @staticmethod def _orphan_unreferenced_assets(assets_query: CTE, *, session: Session) -> None: + # Disable RETURNING since we only need rowcount, not returned values + # This also avoids MariaDB syntax error with DELETE...RETURNING + CTE in EXISTS deleted_orphaned_assets = session.execute( - delete(AssetActive).where( + delete(AssetActive) + .where( exists().where( and_(AssetActive.name == assets_query.c.name, AssetActive.uri == assets_query.c.uri) ) ) + .execution_options(return_defaults=False) Review Comment: `return_defaults=False` does not reliably prevent SQLAlchemy ORM bulk DELETE from using `RETURNING` (it mainly affects default-fetching behavior for INSERT/UPDATE). If the goal is to avoid MariaDB `DELETE ... RETURNING` + CTE issues, use the ORM bulk DML options that control session synchronization/returning behavior (e.g. disable ORM synchronize/fetch strategies via execution options) so SQLAlchemy won’t emit `RETURNING`. ########## airflow-core/src/airflow/migrations/versions/0036_3_0_0_add_name_field_to_dataset_model.py: ########## @@ -107,17 +107,29 @@ def downgrade(): # Keep only the datasets with min id if multiple orphaned datasets with the same uri exist. # This usually happens when all the dags are turned off. - op.execute( - """ - with unique_dataset as (select min(id) as min_id, uri as uri from dataset group by id), - duplicate_dataset_id as ( - select id from dataset join unique_dataset - on dataset.uri = unique_dataset.uri - where dataset.id > unique_dataset.min_id + if op.get_bind().dialect.name == "mysql": + # MySQL/MariaDB error 1093: can't specify target table in a FROM clause of a subquery + # that modifies it. Wrapping in a derived table works around this. + op.execute( + "DELETE FROM dataset WHERE id IN (" + " SELECT id FROM (" + " SELECT d1.id FROM dataset d1" + " JOIN dataset d2 ON d1.uri = d2.uri AND d1.id > d2.id" + " ) AS d3" + ")" + ) + else: + op.execute( + """ + with unique_dataset as (select min(id) as min_id, uri as uri from dataset group by id), + duplicate_dataset_id as ( + select id from dataset join unique_dataset + on dataset.uri = unique_dataset.uri + where dataset.id > unique_dataset.min_id + ) + delete from dataset where id in (select * from duplicate_dataset_id) + """ Review Comment: The CTE groups by `id`, which prevents calculating a single minimum id per `uri` and makes the join produce many redundant rows. Grouping by `uri` would be both clearer and significantly more efficient for large `dataset` tables while preserving the intended behavior. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
