ersalil opened a new pull request, #68284:
URL: https://github.com/apache/airflow/pull/68284
## Summary
`SafeOtelLogger.gauge()` and `timer()` were missing the
`name_is_otel_safe()` guard that `incr()`, `decr()`, and `timing()` all have. A
DAG filename containing a space (e.g. `PBI_SKU_Performance copy.py`) caused the
DagProcessor to crash on every loop iteration when OTel metrics were enabled.
## Why
The DagProcessor emits a gauge metric for every known DAG file on every
parsing loop:
```python
Stats.gauge(f"dag_processing.last_run.seconds_ago.{file_name}", seconds_ago)
```
`gauge()` only validated against the allow/block list
(`metrics_validator.test()`), which does not check OTel naming rules. A
filename with a space passed that check and reached `meter.create_gauge()`
inside the OTel SDK, which raised a plain Exception that propagated uncaught
and crashed the process:
```
Exception: Expected ASCII string of maximum length 63 characters but got
airflow.dag_processing.last_run.seconds_ago.PBI_SKU_Performance copy
```
Note: the OTel SDK error message says "63 characters" but this is a stale
message — the actual SDK validation regex allows up to 255 characters (matching
OTEL_NAME_MAX_LENGTH). The real rejection reason is the space character, which
is not a valid OTel instrument name character. See
[open-telemetry/opentelemetry-python#3442](https://github.com/open-telemetry/opentelemetry-python/pull/3442).
The same gap existed in `timer()`: it had no validation at all before being
passed to `_OtelTimer`, which would then call `record_histogram_value()` with
the invalid name at `.stop()` time.
## Fix
Added `metrics_validator.test(stat) and name_is_otel_safe(self.prefix,
stat)` to `gauge()` (both the `stat` and `back_compat_name` paths) and
`timer()`, making all five recording methods consistent:
| Method | Guard before | Guard after |
|--------|-------------|-------------|
| `incr()` | ✅ both checks | ✅ unchanged |
| `decr()` | ✅ both checks | ✅ unchanged |
| `timing()` | ✅ both checks | ✅ unchanged |
| `gauge()` | ❌ validator only | ✅ both checks |
| `timer()` | ❌ none | ✅ both checks |
## Testing
- `test_gauge_with_invalid_stat_names_skipped_without_raising` — covers the
exact reported scenario (space in filename) and non-ASCII, both reproducing the
DagProcessor crash path.
- `test_timer_with_invalid_stat_name_does_not_record` — covers non-ASCII and
space for the timer path.
closes: #68282
##### Was generative AI tooling used to co-author this PR?
- [X] Yes — Claude Code (Sonnet 4.6)
This PR was prepared with Gen-AI assistance (Claude). I reviewed all
generated code.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]