hkc-8010 opened a new issue, #67525:
URL: https://github.com/apache/airflow/issues/67525

   ### Under which category would you file this issue?
   
   Airflow Core
   
   ### Apache Airflow version
   
   3.2.1
   
   ### What happened and how to reproduce it?
   
   We found a count mismatch in the Airflow 3 UI where the Dashboard "Dag 
Import Errors" badge can show a higher number than the import-errors modal and 
CLI.
   
   In the affected deployment, the home-page badge showed `4` import errors, 
but:
   
   - the import-errors modal showed only `2` files
   - `airflow dags list-import-errors` showed only `2` files
   - the metadata DB contained only `2` `import_error` rows
   
   This appears to happen when one `ParseImportError` file is associated with 
multiple DAGs in `dag`, causing the `/api/v2/importErrors` query to expand one 
import error into multiple joined rows. The endpoint later groups those rows 
back into one returned import-error object per file, but `total_entries` 
appears to be counted before that grouping step.
   
   The original customer report included a UI screenshot showing the count 
mismatch. The screenshot is private support data, so I am not pasting the 
private attachment URL here, but it can be manually attached when filing if 
needed.
   
   Evidence gathered during verification:
   
   1. Internal verification on `2026-05-13T16:43:30Z` confirmed there were only 
`2` rows in `import_error`.
   2. Live verification on `2026-05-26` again showed only `2` import-error rows:
   
   ```text
   (596, 2026-05-26 04:42:26.495552+00:00, 'dags/test_smtp_local.py', 'main')
   (697, 2026-05-26 04:40:27.897028+00:00, 'dags/dwh_garantias_extraction.py', 
'main')
   ```
   
   3. Live `airflow dags list-import-errors` output on `2026-05-26` returned 
only these `2` files:
   
   ```text
   main | dags/dwh_garantias_extraction.py | TypeError: partial() got an 
unexpected keyword argument 'file_format'
   main | dags/test_smtp_local.py          | 
airflow.sdk.exceptions.AirflowRuntimeError: VARIABLE_NOT_FOUND: {'message': 
'Variable AIRFLOW_CONN_SMTP_CONN not found'}
   ```
   
   4. Live metadata query on `2026-05-26` showed that one import-error file 
maps to multiple DAGs:
   
   ```text
   ## dag_counts
   ('dags/dwh_garantias_extraction.py', 'main', 1, 'dwh_garantias_extraction')
   ('dags/test_smtp_local.py', 'main', 3, 'smtp_check_emailoperator, 
smtp_send_smtplib, test_smtp_local')
   
   ## import_error_rows
   (596, 'dags/test_smtp_local.py', 'main', 2026-05-26 04:42:26.495552+00:00)
   (697, 'dags/dwh_garantias_extraction.py', 'main', 2026-05-26 
04:40:27.897028+00:00)
   
   ## joined_rows
   (596, 'dags/test_smtp_local.py', 'main', 'smtp_check_emailoperator')
   (596, 'dags/test_smtp_local.py', 'main', 'smtp_send_smtplib')
   (596, 'dags/test_smtp_local.py', 'main', 'test_smtp_local')
   (697, 'dags/dwh_garantias_extraction.py', 'main', 'dwh_garantias_extraction')
   ```
   
   5. A direct aggregate over that join produced:
   
   ```text
   (4, 2)
   ```
   
   Where:
   
   - `4` = raw joined row count
   - `2` = distinct `import_error.id` count
   
   That matches the user-visible mismatch exactly.
   
   Relevant code paths:
   
   - `airflow/api_fastapi/core_api/routes/public/import_error.py`
     - builds the joined query around `select(ParseImportError, 
file_dags_cte.c.dag_id)`
     - groups the result later with `groupby(...)`
   - `airflow/api_fastapi/common/db/common.py`
     - `paginated_select()` computes `total_entries = 
get_query_count(statement, session=session)` before any route-local grouping
   - `airflow/ui/src/pages/Dashboard/Stats/DAGImportErrors.tsx`
     - renders the Dashboard badge from `data?.total_entries`
   
   Likely reproduction shape:
   
   1. Create or retain a file that appears once in `import_error`.
   2. Ensure that same file path is associated with multiple DAG IDs in `dag`.
   3. Call `/api/v2/importErrors` and observe that `total_entries` reflects raw 
joined rows rather than distinct import-error objects.
   4. Observe that the UI badge uses `total_entries`, while the modal list 
groups back down to fewer entries.
   
   ### What you think should happen instead?
   
   The Dashboard badge, the modal, the CLI, and the DB-backed count should all 
agree on the number of import-error files.
   
   In this case they should all show `2`.
   
   I suspect one of these fixes would resolve it:
   
   1. Make `/api/v2/importErrors` count distinct `ParseImportError.id` values 
after authorization logic instead of counting raw joined rows.
   2. Restructure the route so pagination and counting happen on a deduplicated 
import-error subquery rather than on the raw join.
   3. Add a regression test where one `ParseImportError` file maps to multiple 
DAGs, but `total_entries` still matches the number of distinct import-error 
objects returned.
   
   ### Operating System
   
   Not Applicable - managed Astronomer deployment
   
   ### Deployment
   
   Astronomer
   
   ### Apache Airflow Provider(s)
   
   Not Applicable
   
   ### Versions of Apache Airflow Providers
   
   Not Applicable
   
   ### Official Helm Chart version
   
   Not Applicable
   
   ### Kubernetes Version
   
   Not Applicable
   
   ### Helm Chart configuration
   
   Not Applicable
   
   ### Docker Image customizations
   
   Unknown / not relevant for the API counting bug
   
   ### Anything else?
   
   I did not find an obvious existing Airflow issue or PR for this exact 
count-inflation behavior when searching for:
   
   - `import errors count modal home page`
   - `import error relative_fileloc bundle_name`
   - `importErrors total_entries DagModel relative_fileloc bundle_name`
   
   ### Are you willing to submit PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's Code of Conduct
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to