MichaelRBlack opened a new pull request, #63959:
URL: https://github.com/apache/airflow/pull/63959

   ## Backport of #44346 to v2-11-stable
   
   This is a backport of #44346 (merged to `main` Nov 2024) to the v2-11 
maintenance branch. The fix was never cherry-picked to v2.
   
   ### Problem
   
   `OTLPMetricExporter` and `OTLPSpanExporter` in `otel_logger.py` and 
`otel_tracer.py` have a hardcoded `headers={"Content-Type": 
"application/json"}` parameter. These exporters serialize data as **protobuf** 
and automatically set `Content-Type: application/x-protobuf`. The hardcoded 
override tells the OpenTelemetry Collector to decode the payload as JSON, but 
the bytes are protobuf — causing **100% export failure**:
   
   ```
   Failed to export metrics batch code: 500,
   reason: {"code": 13, "message": "failed to marshal error message"}
   ```
   
   This means OTEL metrics and traces are **completely broken** for every 
Airflow 2.x user sending to a standard OTEL Collector.
   
   As a secondary issue, the hardcoded `headers` parameter also prevents users 
from configuring custom headers via the standard `OTEL_EXPORTER_OTLP_HEADERS` 
environment variable (e.g., for authentication with hosted backends like 
Grafana Cloud or Logfire).
   
   ### Fix
   
   Remove the `headers={"Content-Type": "application/json"}` parameter from 
both `OTLPMetricExporter` and `OTLPSpanExporter`, allowing the SDK to use its 
correct default (`application/x-protobuf`).
   
   ### Testing
   
   Verified on a production Airflow 2.11.1 cluster sending to an OpenTelemetry 
Collector → Mimir pipeline:
   
   - **Before fix**: every 30s export batch fails with HTTP 500 `"failed to 
marshal error message"`
   - **After fix**: zero export errors, metrics immediately visible in Mimir
   
   ```python
   # Reproducer — run from an Airflow pod:
   from opentelemetry.exporter.otlp.proto.http.metric_exporter import 
OTLPMetricExporter
   
   good = OTLPMetricExporter(endpoint=endpoint)
   bad  = OTLPMetricExporter(endpoint=endpoint, headers={"Content-Type": 
"application/json"})
   
   good.export(metrics_data)  # SUCCESS
   bad.export(metrics_data)   # FAILURE — 500 "failed to marshal error message"
   ```
   
   ### Justification for v2 backport
   
   This is a critical bug fix — OTEL metrics and traces are entirely 
non-functional in every Airflow 2.x release. With Airflow 2.x EOL approaching 
(April 2026), this fix would allow the remaining v2 user base to use OTEL 
monitoring for the remainder of the support window.
   
   ^ This diffance is:
   - `airflow/metrics/otel_logger.py`: 1 line removed
   - `airflow/traces/otel_tracer.py`: 1 line changed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to