david-parkk opened a new pull request, #66874:
URL: https://github.com/apache/airflow/pull/66874

   ## Summary
   
   Fix `RuntimeError` when creating a backfill with `from_date` after `to_date` 
by adding explicit validation that raises `InvalidBackfillDateRange` early, 
before any DB operations are attempted.
   
   
   ## Problem
   When a backfill is requested with `from_date` after `to_date` (e.g. 
`from_date=2026-05-13`, `to_date=2026-05-12`, `from_date` > `to_date`), the 
following chain of failures occurs:
   
   **1. HTTP 500 instead of 400**
   `_validate_backfill_params()` had no check for this case. The invalid date 
range passed validation, reached `_get_info_list()` which returned an empty 
list, and hit a generic `RuntimeError("No runs to create for Dag ...")` — which 
the API route's `except` blocks did not catch, resulting in a 500 response. As 
a result, the UI's backfill creation popup does not show a meaningful error 
message.
   
   **2. Orphaned `Backfill` record blocks subsequent backfills**
   In `_create_backfill()`, the `Backfill` record is committed to the DB 
(`session.commit()`) before `_get_info_list()` is called. When `RuntimeError` 
is raised afterwards, the `Backfill` record already exists in the DB with 
`completed_at=None` and no associated `BackfillDagRun` records. Any subsequent 
backfill attempt for the same DAG immediately fails with 
`AlreadyRunningBackfill`.
   
   The scheduler's `_mark_backfills_complete()` does eventually clean up such 
orphaned records (via the `created_at < initializing_cutoff` guard — 2 
minutes), but until then the DAG is effectively locked for backfilling.
   
   ## Before
   ```
   airflow backfill create --dag-id test_backfill_validation --from-date 
2026-05-13 --to-date 2026-05-1
   ```
   no backfill data but backfill processing message
   <img width="2048" height="900" alt="image" 
src="https://github.com/user-attachments/assets/6ba60e64-b708-47eb-9ec4-b8d6a8342a25";
 />
   
   
   ```┌Apache Airflow──────────────┐┏Terminal COPY 
MODE━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
   │ scheduler               UP │┃  File 
"/opt/airflow/airflow-core/src/airflow/utils/providers_configuration_loader.py",
 line 54, in wrapped_function                                                   
      ┃
   │ api_server              UP │┃    return func(*args, **kwargs)              
                                                                                
                                               ┃
   │ triggerer               UP │┃  File 
"/opt/airflow/airflow-core/src/airflow/cli/commands/backfill_command.py", line 
99, in create_backfill                                                          
       ┃
   │ dag_processor           UP │┃    _create_backfill(                         
                                                                                
                                               ┃
   │•shell                   UP │┃  File 
"/opt/airflow/airflow-core/src/airflow/models/backfill.py", line 655, in 
_create_backfill                                                                
             ┃
   │                            │┃    raise RuntimeError(f"No runs to create 
for Dag {dag_id}")                                                              
                                                  ┃
   │                            │┃RuntimeError: No runs to create for Dag 
test_backfill_validation
   ```
   
   ## Changes
   **`models/backfill.py`**
   - Added `InvalidBackfillDateRange` exception class to distinguish date range 
errors from the existing `InvalidBackfillDate` (which covers future date 
requests)
   - Added `from_date > to_date` check at the top of 
`_validate_backfill_params()`, before any DB access — consistent with the "fail 
fast on bad input" principle
   - Grouped date-related validations (`from_date > to_date` and future date 
check) together, followed by DAG structure checks (`depends_on_past`) and 
config validation. This ordering matches the general convention of validating 
raw inputs before inspecting DAG internals
   
   **`routes/public/backfills.py`**
   - Added `InvalidBackfillDateRange` to `import` and both `except` blocks in 
`create_backfill` and `create_backfill_dry_run`, so it is converted to a 400 
`RequestValidationError`
   
   ## After
   ```
   airflow backfill create --dag-id test_backfill_validation --from-date 
2026-05-13 --to-date 2026-05-1
   ```
   no backfill data and no message 
   <img width="2048" height="721" alt="image" 
src="https://github.com/user-attachments/assets/37995236-4029-417f-96e3-253fa3b4bfd6";
 />
   
   ```
   ┌Apache Airflow──────────────┐┏Terminal COPY 
MODE━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
   │ scheduler               UP │┃    return func(*args, **kwargs)              
                                                                                
                                               ┃
   │ api_server              UP │┃  File 
"/opt/airflow/airflow-core/src/airflow/cli/commands/backfill_command.py", line 
99, in create_backfill                                                          
       ┃
   │ triggerer               UP │┃    _create_backfill(                         
                                                                                
                                               ┃
   │ dag_processor           UP │┃  File 
"/opt/airflow/airflow-core/src/airflow/models/backfill.py", line 642, in 
_create_backfill                                                                
             ┃
   │•shell                   UP │┃    _validate_backfill_params(dag, reverse, 
from_date, to_date, reprocess_behavior, dag_run_conf)                           
                                                 ┃
   │                            │┃  File 
"/opt/airflow/airflow-core/src/airflow/models/backfill.py", line 279, in 
_validate_backfill_params                                                       
             ┃
   │                            │┃    raise InvalidBackfillDateRange(           
                                                                                
                                               ┃
   │                            
│┃airflow.models.backfill.InvalidBackfillDateRange: from_date 
(2026-05-13T00:00:00+00:00) must not be after to_date 
(2026-05-01T00:00:00+00:00).
   ```
   ## Discussion
   **Exception message datetime format**
   
   The error message uses `datetime.isoformat()`:
   ```
   from_date (2026-05-13T00:00:00+00:00) must not be after to_date 
(2021-01-01T00:00:00+00:00).
   ```
   
   This format was adopted by referencing other parts of the codebase (e.g. 
`timetables/base.py`, `utils/log/file_task_handler.py`), but I'm not certain it 
is the right convention for exception messages specifically — the existing 
`InvalidBackfillDate` does not include date values at all (`"Backfill cannot be 
executed for future dates."`). Would appreciate guidance on whether to keep the 
values for debuggability or simplify to a static message.
   
   <!-- SPDX-License-Identifier: Apache-2.0
         https://www.apache.org/licenses/LICENSE-2.0 -->
   
   <!--
   Thank you for contributing!
   
   Please provide above a brief description of the changes made in this pull 
request.
   Write a good git commit message following this guide: 
http://chris.beams.io/posts/git-commit/
   
   Please make sure that your code changes are covered with tests.
   And in case of new features or big changes remember to adjust the 
documentation.
   
   Feel free to ping (in general) for the review if you do not see reaction for 
a few days
   (72 Hours is the minimum reaction time you can expect from volunteers) - we 
sometimes miss notifications.
   
   In case of an existing issue, reference it using one of the following:
   
   * closes: #ISSUE
   * related: #ISSUE
   -->
   
   ---
   
   ##### Was generative AI tooling used to co-author this PR?
   
   <!--
   If generative AI tooling has been used in the process of authoring this PR, 
please
   change below checkbox to `[X]` followed by the name of the tool, uncomment 
the "Generated-by".
   -->
   
   - [x] Yes (please specify the tool below)
   - claude
   ---
   
   I'm happy to make any adjustments based on your feedback. Thank you to the 
maintainers for taking the time to review this contribution!
   
   <!--
   Generated-by: [Tool Name] following [the 
guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#gen-ai-assisted-contributions)
   -->
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to