hkc-8010 opened a new issue, #64891:
URL: https://github.com/apache/airflow/issues/64891

   ## Summary
   
   `DagFileProcessorManager._add_callback_to_queue()` only initializes a 
versioned bundle when `request.bundle_version` is truthy.
   
   That means DAG callback requests created with `bundle_version=None` can 
still be queued, but `DagFileInfo` is constructed from `bundle.path` before the 
bundle has been initialized or refreshed.
   
   ```python
   bundle = DagBundlesManager().get_bundle(name=request.bundle_name, 
version=request.bundle_version)
   ...
   if bundle.supports_versioning and request.bundle_version:
       bundle.initialize()
   ...
   file_info = DagFileInfo(
       rel_path=Path(request.filepath),
       bundle_path=bundle.path,
       bundle_name=request.bundle_name,
       bundle_version=request.bundle_version,
   )
   ```
   
   This is especially visible when DAG-level callbacks are produced from a 
`DagRun` with `bundle_version=None`, for example when 
`disable_bundle_versioning=True` is enabled.
   
   ## Why this looks like an Airflow bug
   
   `DagRun.produce_dag_callback()` copies `self.bundle_version` directly into 
`DagCallbackRequest.bundle_version`.
   
   So when bundle versioning is disabled, callback requests are expected to 
have `bundle_version=None`.
   
   In that case, callback queueing currently assumes it is still safe to use 
`bundle.path` without initialization. That is not a safe assumption for 
versioned bundle implementations whose usable path is only known after 
`initialize()` / `refresh()`.
   
   Even if some concrete failures are bundle-backend-specific, the callback 
queueing logic itself appears incomplete: it does not define how callback-only 
processing should resolve a versioned bundle when `bundle_version` is 
intentionally absent.
   
   ## Observed behavior
   
   In a real Airflow 3.1.8 / Astro Hosted investigation, this produced 
callback-only processing like:
   
   ```text
   Detected callback-only processing for DagFileInfo(..., 
bundle_path=PosixPath('/dev/null'), bundle_version=None)
   ```
   
   The underlying bundle backend there exposed `/dev/null` before 
initialization, but the more general problem is that callback queueing skipped 
bundle initialization entirely because `request.bundle_version` was falsy.
   
   ## Expected behavior
   
   If a callback request targets a versioned bundle but `bundle_version` is 
`None`, callback queueing should still resolve a usable bundle path before 
constructing `DagFileInfo`.
   
   Possible expected behaviors:
   - initialize the bundle whenever `bundle.supports_versioning`, even when 
`request.bundle_version is None`
   - or explicitly define a fallback path for callback-only processing that 
resolves the current effective bundle version/context
   - or reject the callback request explicitly instead of silently proceeding 
with an unresolved `bundle.path`
   
   ## Actual behavior
   
   Callback queueing can proceed with an unresolved bundle path when all of the 
following are true:
   - the bundle supports versioning
   - the callback request has `bundle_version=None`
   - the bundle implementation requires `initialize()` / `refresh()` before 
`path` is usable
   
   ## Relevant code paths
   
   - `airflow.models.dagrun.DagRun.produce_dag_callback()`
   - 
`airflow.dag_processing.manager.DagFileProcessorManager._add_callback_to_queue()`
   
   ## Version
   
   Observed on Airflow `3.1.8`.
   
   ## Additional context
   
   I am also filing the backend-specific half of this in Astronomer's bundle 
backend repo, because one concrete manifestation used a backend whose 
unresolved `path` was `/dev/null`. But this issue is about the upstream 
callback queueing contract when `bundle_version=None` is intentional.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to