vatsrahul1001 opened a new pull request, #67527:
URL: https://github.com/apache/airflow/pull/67527

   ## Summary
   
   Triggerer crashes on the next deadline async callback when OpenTelemetry 
metrics are enabled:
   
   ```
   File ".../airflow/jobs/triggerer_job_runner.py", line 659, in handle_events
       Trigger.submit_event(...)
   File ".../airflow/models/callback.py", line 234, in handle_event
       Stats.incr(**self.get_metric_info(status, self.output))
   File ".../airflow/_shared/observability/metrics/otel_logger.py", line 211, 
in incr
       counter.add(count, attributes=tags)
   File ".../opentelemetry/sdk/metrics/.../view_instrument_match.py", line 105
       aggr_key = frozenset(attributes.items())
   TypeError: unhashable type: 'dict'
   ```
   
   ## Root cause
   
   `Callback.get_metric_info` builds the metric tags dict directly from the 
callback's `result` and `self.data` (which includes `kwargs`). Both are 
frequently dicts — for deadline async callbacks the `result` is the user 
callback's return value, and `kwargs` is the captured callback kwargs.
   
   When the metrics backend is OpenTelemetry, the SDK builds the aggregation 
key as `frozenset(attributes.items())`, which raises `TypeError: unhashable 
type: 'dict'` if any value is unhashable (dict, list, set). Result: triggerer 
crash, stalled triggers.
   
   The bug is metrics-backend-dependent: **statsd** accepts non-primitive tag 
values without complaint, so OSS users running the default statsd never see it. 
**OpenTelemetry** backends hit it consistently — surfaced by Astronomer Astro 
Cloud, but affects any OSS deployment that enables `[metrics] otel_*`.
   
   Reproduces against 3.2.1 and main — not a 3.2.x regression.
   
   ## Fix
   
   Sanitize tag values to primitives before returning from `get_metric_info`. 
Keep `str | int | float | bool | None` as-is; JSON-stringify anything else 
using `default=str` so values like `datetime` fall back cleanly instead of 
raising.
   
   ## Test plan
   
   - [X] Adds `test_get_metric_info_dict_values_are_stringified` — exercises 
the deadline-async-callback shape (dict `result` + dict `kwargs`), asserts 
every tag value is primitive, and verifies `frozenset(tags.items())` doesn't 
raise.
   - [X] Updates the existing `test_get_metric_info` assertion: `kwargs` is now 
JSON-stringified rather than left as a dict.
   - [ ] Manual repro on an OTel-enabled deployment — runs the deadline test 
DAG and confirms triggerer no longer crashes.
   
   ## Reported by
   
   Astronomer Runtime team while testing the 3.2.2rc2-based Runtime image on 
Astro Cloud stage env. Pre-existing in Runtime 3.2-4 (3.2.1-based).
   
   ---
   
   ##### Was generative AI tooling used to co-author this PR?
   
   - [X] Yes — Claude Code (Opus 4.7)
   
   Generated-by: Claude Code (Opus 4.7) following [the 
guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#gen-ai-assisted-contributions)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to