krithick-j opened a new issue, #49526:
URL: https://github.com/apache/airflow/issues/49526

   ### Apache Airflow version
   
   2.10.5
   
   ### If "Other Airflow 2 version" selected, which one?
   
   _No response_
   
   ### What happened?
   
   When triggering a DAG run using the Airflow REST API (e.g., via a script 
calling the /dags/{dag_id}/dagRuns endpoint), the task instances created in the 
database (task_instance table) are inserted with a state of NULL instead of a 
runnable state like 'queued' or 'scheduled'.
   
   This prevents the Airflow scheduler from ever identifying these tasks as 
ready for execution (No tasks to consider for execution. is seen in scheduler 
logs), leaving the DAG run stuck indefinitely in the 'queued' state in the UI.
   
   Analysis of PostgreSQL logs confirms the INSERT INTO task_instance query 
generated by Airflow is missing the state and queued_dttm columns from the list 
of columns being inserted.
   
   Version Info
   
   Airflow Version: 2.10.5
   Installation: pip in a Python virtual environment, systemd services for 
scheduler and webserver.
   Executor: LocalExecutor
   Database: PostgreSQL 16, running on localhost.
   Python Version: [Likely Python 3.11 based on venv path - please confirm your 
exact version]
   OS: Ubuntu [Specify version if known, e.g., 22.04 LTS]
   
   ### What you think should happen instead?
   
   _No response_
   
   ### How to reproduce
   
    These steps assume a clean or representative Airflow 2.10.5 installation 
with PostgreSQL.
   
   Reproducible Steps
   
   Set up Airflow:
   Install Apache Airflow v2.10.5 in a Python virtual environment with 
PostgreSQL as the database backend and LocalExecutor.
   Ensure the database is initialized (airflow db init).
   Configure Airflow:
   Update your airflow.cfg (or environment variables) to correctly point 
sql_alchemy_conn to your PostgreSQL database.
   Set executor = LocalExecutor.
   Set the core logging level to DEBUG: [core] logging_level = DEBUG.
   Configure PostgreSQL Verbose Logging:
   Edit your postgresql.conf file (/etc/postgresql/16/main/postgresql.conf or 
similar).
   Set log_connections = on.
   Set log_disconnections = on.
   Set log_min_duration_statement = 0ms (or log_statement = 'all').
   Save the file and reload or restart the PostgreSQL service (sudo systemctl 
reload postgresql or sudo systemctl restart postgresql).
   Add a Test DAG:
   Place a simple DAG file designed for API triggers into your dags_folder. 
This DAG should have schedule_interval=None and define at least one task. (You 
can use your generic_etl_dag.py or a simplified version).
   Start Airflow Components:
   Start the Airflow scheduler: airflow scheduler (or using your systemd 
service sudo systemctl start airflow-scheduler).
   Start the Airflow webserver: airflow webserver (or using your systemd 
service sudo systemctl start airflow-webserver).
   Trigger the DAG via API:
   Use a tool like curl or a Python script (like your main.py) to trigger the 
DAG using the Airflow REST API endpoint /api/v1/dags/{dag_id}/dagRuns. You'll 
need to pass any params your DAG expects in the request body.
   Example curl command (requires an API user/password set up):
   Bash
   
   curl -X POST 'http://localhost:8080/api/v1/dags/YOUR_DAG_ID/dagRuns' \
   -H 'Content-Type: application/json' \
   --user "your_api_user:your_api_password" \
   -d '{"conf": { "user_id": "test_user", "connector_id": "test_connector" }}'
   Replace YOUR_DAG_ID, your_api_user, your_api_password, and the conf content 
as necessary for your DAG.
   Observe Behavior:
   Immediately after triggering, check the Airflow UI. The new DAG run will 
appear and stay in the queued state.
   Watch the Airflow scheduler logs (journalctl -u airflow-scheduler.service 
-f). Look for DEBUG - No tasks to consider for execution. messages.
   Watch the PostgreSQL logs (sudo tail -f /path/to/postgresql.log). Look for 
the INSERT INTO task_instance statement generated for the new run ID.
   Use psql to query the task_instance table for the new run ID and check the 
state column (SELECT task_id, state, run_id FROM task_instance WHERE run_id = 
'your_new_run_id';).
   Expected Outcome of Reproduction (The Bug):
   
   The DAG run remains stuck in the 'queued' state.
   Scheduler logs show No tasks to consider for execution.
   PostgreSQL logs show the INSERT INTO task_instance statement is missing the 
state and queued_dttm columns.
   Direct database query shows the task instances have state=NULL.
   These steps should allow someone else to reproduce the issue on a similar 
setup. Remember to include the relevant log snippets and query outputs in your 
GitHub issue as diagnostic information.
   
   ### Operating System
   
   Distributor ID:      Ubuntu Description:     Ubuntu 24.04.2 LTS Release:     
24.04 Codename: noble
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-amazon                9.4.0
   apache-airflow-providers-common-compat         1.5.1
   apache-airflow-providers-common-io             1.2.0
   apache-airflow-providers-common-sql            1.24.0
   apache-airflow-providers-fab                   1.5.3
   apache-airflow-providers-ftp                   3.7.0
   apache-airflow-providers-http                  4.8.0
   apache-airflow-providers-imap                  3.5.0
   apache-airflow-providers-postgres              5.10.0
   apache-airflow-providers-smtp                  2.0.1
   apache-airflow-providers-sqlite                3.7.0
   
   ### Deployment
   
   Official Apache Airflow Helm Chart
   
   ### Deployment details
   
   _No response_
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to