yanshil commented on PR #56663: URL: https://github.com/apache/airflow/pull/56663#issuecomment-3425246600
I am also trying to clean up some specific dag_id's running data, so I tried your PR by cherry-pick the changes to my deployment (Airflow 3.1.0). According to the airflow db clean doc [link](https://airflow.apache.org/docs/apache-airflow/stable/cli-and-env-variables-ref.html), they have optional tables including the following: ``` -t, --tables Table names to perform maintenance on (use comma-separated list). Options: [‘_xcom_archive’, ‘asset_event’, ‘callback_request’, ‘celery_taskmeta’, ‘celery_tasksetmeta’, ‘dag’, ‘dag_run’, ‘dag_version’, ‘deadline’, ‘import_error’, ‘job’, ‘log’, ‘sla_miss’, ‘task_instance’, ‘task_instance_history’, ‘task_reschedule’, ‘trigger’, ‘xcom’] ``` But in you PR, only these tables are dag related and specifying a default `dag_id_column` `job,dag,dag_run,asset_event,log,sla_miss,task_instance,task_reschedule,xcom,_xcom_archive,deadline,dag_version`. it results in the following error when the cli try to clean up table without a valid `dag_id_column`. ``` Traceback (most recent call last): File "/home/airflow/.local/bin/airflow", line 7, in <module> sys.exit(main()) ^^^^^^ File "/home/airflow/.local/lib/python3.11/site-packages/airflow/__main__.py", line 55, in main args.func(args) File "/home/airflow/.local/lib/python3.11/site-packages/airflow/cli/cli_config.py", line 49, in command return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/airflow/.local/lib/python3.11/site-packages/airflow/utils/cli.py", line 114, in wrapper return f(*args, **kwargs) ^^^^^^^^^^^^^^^^^^ File "/home/airflow/.local/lib/python3.11/site-packages/airflow/utils/providers_configuration_loader.py", line 54, in wrapped_function return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/airflow/.local/lib/python3.11/site-packages/airflow/cli/commands/db_command.py", line 297, in cleanup_tables run_cleanup( File "/home/airflow/.local/lib/python3.11/site-packages/airflow/utils/session.py", line 100, in wrapper return func(*args, session=session, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/airflow/.local/lib/python3.11/site-packages/airflow/utils/db_cleanup.py", line 601, in run_cleanup _cleanup_table( File "/home/airflow/.local/lib/python3.11/site-packages/airflow/utils/db_cleanup.py", line 393, in _cleanup_table query = _build_query( ^^^^^^^^^^^^^ File "/home/airflow/.local/lib/python3.11/site-packages/airflow/utils/db_cleanup.py", line 341, in _build_query raise ValueError("Must provide a dag_id_column along with dag_ids and exclude_dag_ids") ValueError: Must provide a dag_id_column along with dag_ids and exclude_dag_ids ``` I am feeling like ``` if dag_ids or exclude_dag_ids: if dag_id_column is None: raise ValueError("Must provide a dag_id_column along with dag_ids and exclude_dag_ids") base_table_dag_id_col = base_table.c[dag_id_column.name] if dag_ids: conditions.append(base_table_dag_id_col.in_(dag_ids)) if exclude_dag_ids: conditions.append(base_table_dag_id_col.not_in(exclude_dag_ids)) ``` the above logic might shouldn't be mandatory, while you can't always tell the dag_id_column for every table that might be included in the db clean in future version. --- To reproduce it, simply use `airflow db clean --clean-before-timestamp '2025-10-20 00:00:00+08:00' --yes --skip-archive --dag-ids {dag_id}` or specify a -t {table} that have no default dag_id_column specified. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
