hkc-8010 opened a new issue, #64891:
URL: https://github.com/apache/airflow/issues/64891
## Summary
`DagFileProcessorManager._add_callback_to_queue()` only initializes a
versioned bundle when `request.bundle_version` is truthy.
That means DAG callback requests created with `bundle_version=None` can
still be queued, but `DagFileInfo` is constructed from `bundle.path` before the
bundle has been initialized or refreshed.
```python
bundle = DagBundlesManager().get_bundle(name=request.bundle_name,
version=request.bundle_version)
...
if bundle.supports_versioning and request.bundle_version:
bundle.initialize()
...
file_info = DagFileInfo(
rel_path=Path(request.filepath),
bundle_path=bundle.path,
bundle_name=request.bundle_name,
bundle_version=request.bundle_version,
)
```
This is especially visible when DAG-level callbacks are produced from a
`DagRun` with `bundle_version=None`, for example when
`disable_bundle_versioning=True` is enabled.
## Why this looks like an Airflow bug
`DagRun.produce_dag_callback()` copies `self.bundle_version` directly into
`DagCallbackRequest.bundle_version`.
So when bundle versioning is disabled, callback requests are expected to
have `bundle_version=None`.
In that case, callback queueing currently assumes it is still safe to use
`bundle.path` without initialization. That is not a safe assumption for
versioned bundle implementations whose usable path is only known after
`initialize()` / `refresh()`.
Even if some concrete failures are bundle-backend-specific, the callback
queueing logic itself appears incomplete: it does not define how callback-only
processing should resolve a versioned bundle when `bundle_version` is
intentionally absent.
## Observed behavior
In a real Airflow 3.1.8 / Astro Hosted investigation, this produced
callback-only processing like:
```text
Detected callback-only processing for DagFileInfo(...,
bundle_path=PosixPath('/dev/null'), bundle_version=None)
```
The underlying bundle backend there exposed `/dev/null` before
initialization, but the more general problem is that callback queueing skipped
bundle initialization entirely because `request.bundle_version` was falsy.
## Expected behavior
If a callback request targets a versioned bundle but `bundle_version` is
`None`, callback queueing should still resolve a usable bundle path before
constructing `DagFileInfo`.
Possible expected behaviors:
- initialize the bundle whenever `bundle.supports_versioning`, even when
`request.bundle_version is None`
- or explicitly define a fallback path for callback-only processing that
resolves the current effective bundle version/context
- or reject the callback request explicitly instead of silently proceeding
with an unresolved `bundle.path`
## Actual behavior
Callback queueing can proceed with an unresolved bundle path when all of the
following are true:
- the bundle supports versioning
- the callback request has `bundle_version=None`
- the bundle implementation requires `initialize()` / `refresh()` before
`path` is usable
## Relevant code paths
- `airflow.models.dagrun.DagRun.produce_dag_callback()`
-
`airflow.dag_processing.manager.DagFileProcessorManager._add_callback_to_queue()`
## Version
Observed on Airflow `3.1.8`.
## Additional context
I am also filing the backend-specific half of this in Astronomer's bundle
backend repo, because one concrete manifestation used a backend whose
unresolved `path` was `/dev/null`. But this issue is about the upstream
callback queueing contract when `bundle_version=None` is intentional.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]