shahar1 opened a new pull request, #67701:
URL: https://github.com/apache/airflow/pull/67701
Cache the result of `inspect.signature(BaseOperator.__init__)` as a
class-level
constant in `OperatorSerialization` instead of recomputing it on every task
during every DAG serialization.
## Root cause
`_serialize_node()` (called once per task) contained:
```python
forbidden_fields = set(signature(BaseOperator.__init__).parameters.keys())
forbidden_fields.difference_update({"email"})
```
`BaseOperator.__init__` is a regular Python function whose signature never
changes in a running process. Python's `inspect.signature()` is **not**
cached
for regular functions — each call walks `__code__`, `__annotations__`, and
defaults from scratch. This computed the same 57-param frozenset once per
task,
per serialization.
## Fix
Promote `forbidden_fields` to a class constant evaluated once at class-load
time:
```python
_FORBIDDEN_TEMPLATE_FIELDS: ClassVar[frozenset[str]] = (
frozenset(signature(BaseOperator.__init__).parameters) - {"email"}
)
```
Replace the two-line computation in `_serialize_node` with
`cls._FORBIDDEN_TEMPLATE_FIELDS`.
## Safety
`BaseOperator.__init__` is a plain Python method — not `@classmethod`, not
dynamically generated, no metaclass reconstruction. Its signature is fixed at
import time and is identical regardless of operator subclass (the code always
inspects `BaseOperator.__init__`, not `type(op).__init__`). Caching is
equivalent
to defining any other class-level constant.
## Benchmark results
Benchmark script:
https://gist.github.com/shahar1/d6a19bcc7405a8ca23841a6356d5c4e4
Run with: `uv run --project airflow-core python dev/bench_signature_cache.py`
### End-to-end `DagSerialization.to_dict()` (warm venv, parallel baseline
run)
| Scenario | Before (min) | After (min) | Δ |
|---|---|---|---|
| 10 tasks × 1 outlet | ~2.4 ms | ~1.7 ms | −29 % |
| 100 tasks × 1 outlet | ~16.7 ms | ~11.3 ms | −32 % |
| 500 tasks × 5 outlets | ~104.7 ms | ~77.1 ms | −26 % |
| 1000 tasks × 5 outlets | ~208.5 ms | ~147.7 ms | −29 % |
| 200 tasks × 20 outlets | ~89.3 ms | ~60.9 ms | −32 % |
### cProfile (200 tasks × 20 outlets, 5 iterations)
Before: `inspect.signature` accounted for **22% of total `serialize_dag()`
wall-time**
(0.218 s of 1.347 s total).
After: `inspect.signature` is **completely absent from the top 15** profiled
functions. Total function call count drops by ~660 k calls per 5-iteration
session (one `signature()` call eliminated per task per serialization).
---
##### Was generative AI tooling used to co-author this PR?
- [X] Yes — Claude Code (Sonnet 4.6)
Generated-by: Claude Code (Sonnet 4.6) following [the
guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#gen-ai-assisted-contributions)
---
Drafted-by: Claude Code (Sonnet 4.6); reviewed by @shahar1 before posting
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]