shahar1 opened a new pull request, #67701:
URL: https://github.com/apache/airflow/pull/67701

   Cache the result of `inspect.signature(BaseOperator.__init__)` as a 
class-level
   constant in `OperatorSerialization` instead of recomputing it on every task
   during every DAG serialization.
   
   ## Root cause
   
   `_serialize_node()` (called once per task) contained:
   
   ```python
   forbidden_fields = set(signature(BaseOperator.__init__).parameters.keys())
   forbidden_fields.difference_update({"email"})
   ```
   
   `BaseOperator.__init__` is a regular Python function whose signature never
   changes in a running process. Python's `inspect.signature()` is **not** 
cached
   for regular functions — each call walks `__code__`, `__annotations__`, and
   defaults from scratch. This computed the same 57-param frozenset once per 
task,
   per serialization.
   
   ## Fix
   
   Promote `forbidden_fields` to a class constant evaluated once at class-load 
time:
   
   ```python
   _FORBIDDEN_TEMPLATE_FIELDS: ClassVar[frozenset[str]] = (
       frozenset(signature(BaseOperator.__init__).parameters) - {"email"}
   )
   ```
   
   Replace the two-line computation in `_serialize_node` with 
`cls._FORBIDDEN_TEMPLATE_FIELDS`.
   
   ## Safety
   
   `BaseOperator.__init__` is a plain Python method — not `@classmethod`, not
   dynamically generated, no metaclass reconstruction. Its signature is fixed at
   import time and is identical regardless of operator subclass (the code always
   inspects `BaseOperator.__init__`, not `type(op).__init__`). Caching is 
equivalent
   to defining any other class-level constant.
   
   ## Benchmark results
   
   Benchmark script: 
https://gist.github.com/shahar1/d6a19bcc7405a8ca23841a6356d5c4e4  
   Run with: `uv run --project airflow-core python dev/bench_signature_cache.py`
   
   ### End-to-end `DagSerialization.to_dict()` (warm venv, parallel baseline 
run)
   
   | Scenario | Before (min) | After (min) | Δ |
   |---|---|---|---|
   | 10 tasks × 1 outlet | ~2.4 ms | ~1.7 ms | −29 % |
   | 100 tasks × 1 outlet | ~16.7 ms | ~11.3 ms | −32 % |
   | 500 tasks × 5 outlets | ~104.7 ms | ~77.1 ms | −26 % |
   | 1000 tasks × 5 outlets | ~208.5 ms | ~147.7 ms | −29 % |
   | 200 tasks × 20 outlets | ~89.3 ms | ~60.9 ms | −32 % |
   
   ### cProfile (200 tasks × 20 outlets, 5 iterations)
   
   Before: `inspect.signature` accounted for **22% of total `serialize_dag()` 
wall-time**
   (0.218 s of 1.347 s total).
   
   After: `inspect.signature` is **completely absent from the top 15** profiled
   functions. Total function call count drops by ~660 k calls per 5-iteration
   session (one `signature()` call eliminated per task per serialization).
   
   ---
   
   ##### Was generative AI tooling used to co-author this PR?
   
   - [X] Yes — Claude Code (Sonnet 4.6)
   
   Generated-by: Claude Code (Sonnet 4.6) following [the 
guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#gen-ai-assisted-contributions)
   
   ---
   
   Drafted-by: Claude Code (Sonnet 4.6); reviewed by @shahar1 before posting


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to