ramitkataria commented on code in PR #66608:
URL: https://github.com/apache/airflow/pull/66608#discussion_r3424874000
##########
task-sdk/src/airflow/sdk/execution_time/callback_supervisor.py:
##########
@@ -234,16 +257,39 @@ def _target():
_log.debug(
"Added bundle path to sys.path",
bundle_name=bundle_info.name, path=bundle_path
)
+ # DAG processor loads bundle files with a mangled module
name
+ # (unusual_prefix_{hash}_{stem}) to avoid collisions. The
callback path
+ # was serialized using that mangled name. Register the
module under that
+ # name so import_string can find it in the subprocess.
+ if callback_path and
callback_path.startswith(UNUSUAL_MODULE_PREFIX):
+ _register_unusual_prefix_module(callback_path,
bundle.path, _log)
except Exception:
_log.warning(
"Failed to initialize DAG bundle for callback",
bundle_name=bundle_info.name,
exc_info=True,
)
+ # When DagRun identifiers are provided, fetch the DagRun via
SUPERVISOR_COMMS
+ # and build a context dict to pass to the callback function.
+ effective_kwargs = dict(callback_kwargs)
+ if dag_id and run_id:
+ context =
_fetch_and_build_context(task_runner.SUPERVISOR_COMMS, dag_id, run_id, _log)
+ if context is None:
+ _log.error(
+ "Cannot execute callback without context — failing to
surface the error rather than running degraded",
+ dag_id=dag_id,
+ run_id=run_id,
+ )
+ sys.exit(1)
+ if deadline_id or deadline_time:
+ context["deadline"] = {"id": deadline_id, "deadline_time":
deadline_time}
+ effective_kwargs["context"] = context
+ effective_kwargs = _render_callback_kwargs(effective_kwargs,
context, _log)
Review Comment:
A couple of "keep the two paths in lockstep" thoughts, now that the async
path renders Jinja too (nice, that closes the gap I was going to flag):
- The new rendering helper landed in the shared lib
(airflow._shared.template_rendering.render_callback_kwargs) and the async path
uses it, but the executor path still calls its own local
_render_callback_kwargs here rather than the shared one. Worth pointing both at
the shared helper so they can't drift.
- Same theme for the deadline dict: it's still assembled separately on
each side (executor sets context["deadline"] directly, triggerer threads it
through dag_run_data["_deadline"]) instead of in build_context_from_dag_run.
Moving it into the shared helper would make both paths produce identical
context by construction.
##########
task-sdk/src/airflow/sdk/execution_time/callback_supervisor.py:
##########
@@ -209,14 +218,36 @@ def _target():
_log.debug(
"Added bundle path to sys.path",
bundle_name=bundle_info.name, path=bundle_path
)
+ # DAG processor loads bundle files with a mangled module
name
+ # (unusual_prefix_{hash}_{stem}) to avoid collisions. The
callback path
+ # was serialized using that mangled name. Register the
module under that
+ # name so import_string can find it in the subprocess.
+ if callback_path and
callback_path.startswith("unusual_prefix_"):
+ _register_unusual_prefix_module(callback_path,
bundle.path, _log)
except Exception:
_log.warning(
"Failed to initialize DAG bundle for callback",
bundle_name=bundle_info.name,
exc_info=True,
)
- success, error_msg = execute_callback(callback_path,
callback_kwargs, _log)
+ # When DagRun identifiers are provided, fetch the DagRun via
SUPERVISOR_COMMS
+ # and build a context dict to pass to the callback function.
+ effective_kwargs = dict(callback_kwargs)
+ if dag_id and run_id:
+ context =
_fetch_and_build_context(task_runner.SUPERVISOR_COMMS, dag_id, run_id, _log)
+ if context is None:
+ _log.error(
+ "Cannot execute callback without context — failing so
it can be retried",
+ dag_id=dag_id,
+ run_id=run_id,
+ )
+ sys.exit(1)
Review Comment:
[placeholder]
##########
task-sdk/src/airflow/sdk/execution_time/callback_supervisor.py:
##########
@@ -234,16 +257,39 @@ def _target():
_log.debug(
"Added bundle path to sys.path",
bundle_name=bundle_info.name, path=bundle_path
)
+ # DAG processor loads bundle files with a mangled module
name
+ # (unusual_prefix_{hash}_{stem}) to avoid collisions. The
callback path
+ # was serialized using that mangled name. Register the
module under that
+ # name so import_string can find it in the subprocess.
+ if callback_path and
callback_path.startswith(UNUSUAL_MODULE_PREFIX):
+ _register_unusual_prefix_module(callback_path,
bundle.path, _log)
except Exception:
_log.warning(
"Failed to initialize DAG bundle for callback",
bundle_name=bundle_info.name,
exc_info=True,
)
+ # When DagRun identifiers are provided, fetch the DagRun via
SUPERVISOR_COMMS
+ # and build a context dict to pass to the callback function.
+ effective_kwargs = dict(callback_kwargs)
+ if dag_id and run_id:
+ context =
_fetch_and_build_context(task_runner.SUPERVISOR_COMMS, dag_id, run_id, _log)
+ if context is None:
+ _log.error(
+ "Cannot execute callback without context — failing to
surface the error rather than running degraded",
+ dag_id=dag_id,
+ run_id=run_id,
+ )
+ sys.exit(1)
Review Comment:
Re-raising the point from
https://github.com/apache/airflow/pull/66608#discussion_r3366549044, that
thread got marked resolved, but as far as I can tell the code is unchanged and
the concern still stands, so flagging it here on the current line.
The reasoning for fetching at runtime makes sense to me, and I agree we
shouldn't go back to serializing context into the DB blob. The part I'm still
unsure about isn't runtime-vs-queue-time fetching, it's the asymmetry between
the two paths. At HEAD, on a missing/unfetchable DagRun the triggerer path
returns None and the trigger gets re-evaluated on the next loop (so it
effectively retries), while the executor path here `sys.exit(1)`s into a
terminal `CallbackState.FAILED` with no requeue. Since a goal of this PR is to
make the two paths behave the same, that difference feels worth closing, and if
"retry on the next loop" is acceptable on the triggerer side, maybe it's
acceptable on the executor side too?
##########
airflow-core/docs/howto/deadline-alerts.rst:
##########
Review Comment:
Let's also make sure to update/remove this section based on the updated
functionality
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]