yanshil commented on PR #56663:
URL: https://github.com/apache/airflow/pull/56663#issuecomment-3425246600

   I am also trying to clean up some specific dag_id's running data, so I tried 
your PR by cherry-pick the changes to my deployment (Airflow 3.1.0).
   
   According to the airflow db clean doc 
[link](https://airflow.apache.org/docs/apache-airflow/stable/cli-and-env-variables-ref.html),
 they have optional tables including the following:
   
   ```
   -t, --tables
   Table names to perform maintenance on (use comma-separated list). Options: 
[‘_xcom_archive’, ‘asset_event’, ‘callback_request’, ‘celery_taskmeta’, 
‘celery_tasksetmeta’, ‘dag’, ‘dag_run’, ‘dag_version’, ‘deadline’, 
‘import_error’, ‘job’, ‘log’, ‘sla_miss’, ‘task_instance’, 
‘task_instance_history’, ‘task_reschedule’, ‘trigger’, ‘xcom’]
   ```
   
   But in you PR, only these tables are dag related and specifying a default 
`dag_id_column` 
`job,dag,dag_run,asset_event,log,sla_miss,task_instance,task_reschedule,xcom,_xcom_archive,deadline,dag_version`.
   
   it results in the following error when the cli try to clean up table without 
a valid `dag_id_column`. 
   
   ```
   Traceback (most recent call last):
     File "/home/airflow/.local/bin/airflow", line 7, in <module>
       sys.exit(main())
                ^^^^^^
     File 
"/home/airflow/.local/lib/python3.11/site-packages/airflow/__main__.py", line 
55, in main
       args.func(args)
     File 
"/home/airflow/.local/lib/python3.11/site-packages/airflow/cli/cli_config.py", 
line 49, in command
       return func(*args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.11/site-packages/airflow/utils/cli.py", line 
114, in wrapper
       return f(*args, **kwargs)
              ^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.11/site-packages/airflow/utils/providers_configuration_loader.py",
 line 54, in wrapped_function
       return func(*args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.11/site-packages/airflow/cli/commands/db_command.py",
 line 297, in cleanup_tables
       run_cleanup(
     File 
"/home/airflow/.local/lib/python3.11/site-packages/airflow/utils/session.py", 
line 100, in wrapper
       return func(*args, session=session, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.11/site-packages/airflow/utils/db_cleanup.py",
 line 601, in run_cleanup   
       _cleanup_table(
     File 
"/home/airflow/.local/lib/python3.11/site-packages/airflow/utils/db_cleanup.py",
 line 393, in _cleanup_table    query = _build_query(
               ^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.11/site-packages/airflow/utils/db_cleanup.py",
 line 341, in _build_query  
       raise ValueError("Must provide a dag_id_column along with dag_ids and 
exclude_dag_ids")
   ValueError: Must provide a dag_id_column along with dag_ids and 
exclude_dag_ids
   ```
   
   I am feeling like 
   ```
   
       if dag_ids or exclude_dag_ids:
           if dag_id_column is None:
               raise ValueError("Must provide a dag_id_column along with 
dag_ids and exclude_dag_ids")
   
           base_table_dag_id_col = base_table.c[dag_id_column.name]
   
           if dag_ids:
               conditions.append(base_table_dag_id_col.in_(dag_ids))
           if exclude_dag_ids:
               conditions.append(base_table_dag_id_col.not_in(exclude_dag_ids))
   ```
   
   the above logic might shouldn't be mandatory, while you can't always tell 
the dag_id_column for every table that might be included in the db clean in 
future version.
   
   ---
   
   To reproduce it, simply use `airflow db clean --clean-before-timestamp 
'2025-10-20 00:00:00+08:00' --yes --skip-archive --dag-ids {dag_id}` or specify 
a -t {table} that have no default dag_id_column specified.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to