Copilot commented on code in PR #60133:
URL: https://github.com/apache/airflow/pull/60133#discussion_r3066480954


##########
scripts/ci/docker-compose/devcontainer-mariadb.yml:
##########
@@ -0,0 +1,24 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+---
+services:
+  mariadb:
+    image: mariadb:11.8
+    ports:
+      - "3306:3306"

Review Comment:
   The MariaDB container is missing required initialization env vars (e.g. 
`MARIADB_ALLOW_EMPTY_ROOT_PASSWORD` / `MARIADB_ROOT_PASSWORD`), so it will fail 
to start with the official `mariadb` image. Also, the declared volume is never 
mounted into the service. Add the appropriate MariaDB env vars and mount the 
named volume to `/var/lib/mysql` (or remove the unused volume).
   ```suggestion
       image: mariadb:11.8
       environment:
         MARIADB_ROOT_PASSWORD: mariadb
       ports:
         - "3306:3306"
       volumes:
         - mysql-db-volume:/var/lib/mysql
   ```



##########
dev/breeze/src/airflow_breeze/utils/selective_checks.py:
##########
@@ -785,8 +785,12 @@ def mysql_versions(self) -> list[str]:
         if self.all_versions:
             return CURRENT_MYSQL_VERSIONS
         if self.latest_versions_only:
-            return [CURRENT_MYSQL_VERSIONS[-1]]
-        return [DEFAULT_MYSQL_VERSION]
+            # Return latest MySQL and latest MariaDB
+            mysql_versions = [v for v in CURRENT_MYSQL_VERSIONS if not 
v.startswith("mariadb:")]
+            mariadb_versions = [v for v in CURRENT_MYSQL_VERSIONS if 
v.startswith("mariadb:")]
+            return [mysql_versions[-1], mariadb_versions[-1]]
+        # Test both MySQL and MariaDB LTS by default
+        return [DEFAULT_MYSQL_VERSION, "mariadb:11.8"]

Review Comment:
   Indexing `mysql_versions[-1]` / `mariadb_versions[-1]` will raise 
`IndexError` if either list is empty. That can happen if 
`CURRENT_MYSQL_VERSIONS` is changed (e.g. temporarily only MySQL or only 
MariaDB). Guard against empty lists and fall back to defaults. Also, the 
hard-coded `'mariadb:11.8'` duplicates the version defined in constants—prefer 
deriving the default MariaDB selection from the MariaDB constants to avoid 
drift.



##########
scripts/ci/docker-compose/devcontainer-mariadb.yml:
##########
@@ -0,0 +1,24 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+---
+services:
+  mariadb:
+    image: mariadb:11.8

Review Comment:
   This hard-codes the MariaDB image tag, but `devcontainer.env` introduces 
`MARIADB_VERSION`. Consider using `image: mariadb:${MARIADB_VERSION}` to keep 
version selection consistent with the rest of the dev/CI compose setup.
   ```suggestion
       image: mariadb:${MARIADB_VERSION}
   ```



##########
airflow-core/src/airflow/utils/sqlalchemy.py:
##########
@@ -353,13 +353,8 @@ def with_row_locks(
     if not dialect_name:
         return query
 
-    # Don't use row level locks if the MySQL dialect (Mariadb & MySQL < 8) 
does not support it.
     if not USE_ROW_LEVEL_LOCKING:
         return query
-    if dialect_name == "mysql" and not getattr(
-        session.bind.dialect if session.bind else None, 
"supports_for_update_of", False
-    ):
-        return query
     if nowait:
         kwargs["nowait"] = True
     if skip_locked:

Review Comment:
   After removing the MySQL/MariaDB guard, `nowait`/`skip_locked` can be 
requested on dialects/versions that don’t support them (e.g. older 
MySQL/MariaDB), which can result in SQL syntax errors at runtime. Consider 
checking the dialect capability flags (e.g. `supports_for_update_nowait` / 
`supports_for_update_skip_locked`) before setting those kwargs, and silently 
degrade when unsupported.
   ```suggestion
   
       dialect = session.get_bind().dialect
       if nowait and getattr(dialect, "supports_for_update_nowait", False):
           kwargs["nowait"] = True
       if skip_locked and getattr(dialect, "supports_for_update_skip_locked", 
False):
   ```



##########
dev/breeze/src/airflow_breeze/params/shell_params.py:
##########
@@ -324,7 +324,8 @@ def backend_version(self) -> str:
         if self.backend == "postgres":
             version = self.postgres_version
         if self.backend == "mysql":
-            version = self.mysql_version
+            # Strip mariadb: prefix if present for display
+            version = self.mysql_version.replace("mariadb:", "")

Review Comment:
   Using `replace()` will remove `mariadb:` anywhere in the string (not just 
the prefix) and also drops the engine identity from the returned 
`backend_version` value. If `backend_version` is used for identifiers (job 
names, cache keys, artifacts, etc.), MariaDB runs may become indistinguishable 
from a plain numeric version. Prefer `removeprefix('mariadb:')` for 
correctness, and consider keeping `backend_version` as the full value while 
adding a separate display-only property for the stripped version.



##########
airflow-core/src/airflow/jobs/scheduler_job_runner.py:
##########
@@ -3026,12 +3026,16 @@ def _update_asset_orphanage(self, session: Session = 
NEW_SESSION) -> None:
 
     @staticmethod
     def _orphan_unreferenced_assets(assets_query: CTE, *, session: Session) -> 
None:
+        # Disable RETURNING since we only need rowcount, not returned values
+        # This also avoids MariaDB syntax error with DELETE...RETURNING + CTE 
in EXISTS
         deleted_orphaned_assets = session.execute(
-            delete(AssetActive).where(
+            delete(AssetActive)
+            .where(
                 exists().where(
                     and_(AssetActive.name == assets_query.c.name, 
AssetActive.uri == assets_query.c.uri)
                 )
             )
+            .execution_options(return_defaults=False)

Review Comment:
   `return_defaults=False` does not reliably prevent SQLAlchemy ORM bulk DELETE 
from using `RETURNING` (it mainly affects default-fetching behavior for 
INSERT/UPDATE). If the goal is to avoid MariaDB `DELETE ... RETURNING` + CTE 
issues, use the ORM bulk DML options that control session 
synchronization/returning behavior (e.g. disable ORM synchronize/fetch 
strategies via execution options) so SQLAlchemy won’t emit `RETURNING`.



##########
airflow-core/src/airflow/migrations/versions/0036_3_0_0_add_name_field_to_dataset_model.py:
##########
@@ -107,17 +107,29 @@ def downgrade():
 
     # Keep only the datasets with min id if multiple orphaned datasets with 
the same uri exist.
     # This usually happens when all the dags are turned off.
-    op.execute(
-        """
-        with unique_dataset as (select min(id) as min_id, uri as uri from 
dataset group by id),
-        duplicate_dataset_id as (
-            select id from dataset join unique_dataset
-            on dataset.uri = unique_dataset.uri
-            where dataset.id > unique_dataset.min_id
+    if op.get_bind().dialect.name == "mysql":
+        # MySQL/MariaDB error 1093: can't specify target table in a FROM 
clause of a subquery
+        # that modifies it. Wrapping in a derived table works around this.
+        op.execute(
+            "DELETE FROM dataset WHERE id IN ("
+            "    SELECT id FROM ("
+            "        SELECT d1.id FROM dataset d1"
+            "        JOIN dataset d2 ON d1.uri = d2.uri AND d1.id > d2.id"
+            "    ) AS d3"
+            ")"
+        )
+    else:
+        op.execute(
+            """
+            with unique_dataset as (select min(id) as min_id, uri as uri from 
dataset group by id),
+            duplicate_dataset_id as (
+                select id from dataset join unique_dataset
+                on dataset.uri = unique_dataset.uri
+                where dataset.id > unique_dataset.min_id
+            )
+            delete from dataset where id in (select * from 
duplicate_dataset_id)
+            """

Review Comment:
   The CTE groups by `id`, which prevents calculating a single minimum id per 
`uri` and makes the join produce many redundant rows. Grouping by `uri` would 
be both clearer and significantly more efficient for large `dataset` tables 
while preserving the intended behavior.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to