tanvn opened a new issue, #28051:
URL: https://github.com/apache/airflow/issues/28051

   ### Apache Airflow version
   
   2.4.3
   
   ### What happened
   
   Tried executing the following command to clean old data in my service:
   
   ```
   airflow db clean --skip-archive --clean-before-timestamp '2022-10-31 
00:00:00+0000'  --tables 'xcom, log, dag_run, task_instance' --verbose --yes
   ```
   Then got the following error
   
   ```
   ...
   [2022-12-02T08:34:18.065+0000] {db_cleanup.py:138} DEBUG - ctas query:
   CREATE TABLE _airflow_deleted__dag_run__20221202083418 AS SELECT base.*
   FROM dag_run AS base LEFT OUTER JOIN (SELECT dag_id, max(dag_run.start_date) 
AS max_date_per_group
   FROM dag_run
   WHERE external_trigger = false GROUP BY dag_id) AS latest ON base.dag_id = 
latest.dag_id AND base.start_date = max_date_per_group
   WHERE base.start_date < :start_date_1 AND max_date_per_group IS NULL
   [2022-12-02T08:34:18.069+0000] {cli_action_loggers.py:83} DEBUG - Calling 
callbacks: []
   Traceback (most recent call last):
     File 
"/opt/app-root/lib64/python3.8/site-packages/mysql/connector/connection_cext.py",
 line 565, in cmd_query
       self._cmysql.query(
   _mysql_connector.MySQLInterfaceError: Statement violates GTID consistency: 
CREATE TABLE ... SELECT.The above exception was the direct cause of the 
following exception:Traceback (most recent call last):
     File 
"/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/engine/base.py", line 
1802, in _execute_context
       self.dialect.do_execute(
     File 
"/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/engine/default.py", 
line 719, in do_execute
       cursor.execute(statement, parameters)
     File 
"/opt/app-root/lib64/python3.8/site-packages/mysql/connector/cursor_cext.py", 
line 279, in execute
       result = self._cnx.cmd_query(
     File 
"/opt/app-root/lib64/python3.8/site-packages/mysql/connector/connection_cext.py",
 line 573, in cmd_query
       raise get_mysql_exception(
   mysql.connector.errors.DatabaseError: 1786 (HY000): Statement violates GTID 
consistency: CREATE TABLE ... SELECT.The above exception was the direct cause 
of the following exception:Traceback (most recent call last):
   ...
   ```
   
   our MySQL server has `enforce_gtid_consistency` parameter set to ON and it 
does not allow `CREATE TABLE ... SELECT`  statement.
   
   ### What you think should happen instead
   
   I expected that the command should work for MySQL with 
enforce_gtid_consistency enabled.
   
   ### How to reproduce
   
   As described above.
   
   ### Operating System
   
   K8S with base image is CentOS
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Official Apache Airflow Helm Chart
   
   ### Deployment details
   
   _No response_
   
   ### Anything else
   
   Can we avoid this kind of issue by breaking the statement into 2, like:
   ```
   CREATE TABLE _airflow_deleted__dag_run__20221202083418 LIKE dag_run; 
   INSERT _airflow_deleted__dag_run__20221202083418 SELECT * FROM  SELECT base.*
   FROM dag_run AS base LEFT OUTER JOIN (SELECT dag_id, max(dag_run.start_date) 
AS max_date_per_group
   FROM dag_run
   WHERE external_trigger = false GROUP BY dag_id) AS latest ON base.dag_id = 
latest.dag_id AND base.start_date = max_date_per_group
   WHERE base.start_date < :start_date_1 AND max_date_per_group IS NULL;
   
   ```
   Ref: https://stackoverflow.com/a/56068655/1526790
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to