ersalil opened a new issue, #68282:
URL: https://github.com/apache/airflow/issues/68282

   ### Under which category would you file this issue?
   
   Airflow Core
   
   ### Apache Airflow version
   
   3.x
   
   ### What happened and how to reproduce it?
   
   When OTel metrics are enabled and a DAG file has a name containing a space 
or other character invalid for OTel instrument names (e.g. `PBI_SKU_Performance 
copy.py`), the DagProcessor and scheduler crash with an unhandled exception on 
every loop iteration:
   
   ```
   Traceback (most recent call last):
     File "/usr/local/bin/airflow", line 10, in <module>
       sys.exit(main())
                ~~~~^^
     File "/usr/local/lib/python3.13/site-packages/airflow/__main__.py", line 
55, in main
       args.func(args)
       ~~~~~~~~~^^^^^^
     File "/usr/local/lib/python3.13/site-packages/airflow/cli/cli_config.py", 
line 49, in command
       return func(*args, **kwargs)
     File 
"/usr/local/lib/python3.13/site-packages/airflow/utils/memray_utils.py", line 
60, in wrapper
       return func(*args, **kwargs)
     File "/usr/local/lib/python3.13/site-packages/airflow/utils/cli.py", line 
113, in wrapper
       return f(*args, **kwargs)
     File 
"/usr/local/lib/python3.13/site-packages/airflow/utils/providers_configuration_loader.py",
 line 54, in wrapped_function
       return func(*args, **kwargs)
     File 
"/usr/local/lib/python3.13/site-packages/airflow/cli/commands/dag_processor_command.py",
 line 64, in dag_processor
       run_command_with_daemon_option(
       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
           args=args,
           ^^^^^^^^^^
       ...<2 lines>...
           should_setup_logging=True,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
       )
       ^
     File 
"/usr/local/lib/python3.13/site-packages/airflow/cli/commands/daemon_utils.py", 
line 86, in run_command_with_daemon_option
       callback()
       ~~~~~~~~^^
     File 
"/usr/local/lib/python3.13/site-packages/airflow/cli/commands/dag_processor_command.py",
 line 67, in <lambda>
       callback=lambda: run_job(job=job_runner.job, 
execute_callable=job_runner._execute),
                        
~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File "/usr/local/lib/python3.13/site-packages/airflow/utils/session.py", 
line 100, in wrapper
       return func(*args, session=session, **kwargs)  # type: ignore[arg-type]
     File "/usr/local/lib/python3.13/site-packages/airflow/jobs/job.py", line 
355, in run_job
       return execute_job(job, execute_callable=execute_callable)
     File "/usr/local/lib/python3.13/site-packages/airflow/jobs/job.py", line 
384, in execute_job
       ret = execute_callable()
     File 
"/usr/local/lib/python3.13/site-packages/airflow/jobs/dag_processor_job_runner.py",
 line 61, in _execute
       self.processor.run()
       ~~~~~~~~~~~~~~~~~~^^
     File 
"/usr/local/lib/python3.13/site-packages/airflow/dag_processing/manager.py", 
line 339, in run
       return self._run_parsing_loop()
              ~~~~~~~~~~~~~~~~~~~~~~^^
     File 
"/usr/local/lib/python3.13/site-packages/airflow/dag_processing/manager.py", 
line 469, in _run_parsing_loop
       self.print_stats(known_files=known_files)
       ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/usr/local/lib/python3.13/site-packages/airflow/dag_processing/manager.py", 
line 781, in print_stats
       self._log_file_processing_stats(known_files=known_files)
       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/usr/local/lib/python3.13/site-packages/airflow/dag_processing/manager.py", 
line 852, in _log_file_processing_stats
       Stats.gauge(f"dag_processing.last_run.seconds_ago.{file_name}", 
seconds_ago)
       
~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/usr/local/lib/python3.13/site-packages/airflow/_shared/observability/metrics/otel_logger.py",
 line 272, in gauge
       self.metrics_map.set_gauge_value(full_name(prefix=self.prefix, 
name=stat), value, delta, tags)
       
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/usr/local/lib/python3.13/site-packages/airflow/_shared/observability/metrics/otel_logger.py",
 line 375, in set_gauge_value
       self.map[key] = InternalGauge(meter=self.meter, name=name, tags=tags)
                       ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/usr/local/lib/python3.13/site-packages/airflow/_shared/observability/metrics/otel_logger.py",
 line 304, in __init__
       self.gauge = meter.create_gauge(name=otel_safe_name)
                    ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^
     File 
"/usr/local/lib/python3.13/site-packages/opentelemetry/sdk/metrics/_internal/__init__.py",
 line 263, in create_gauge
       self._instrument_id_instrument[status.instrument_id] = _Gauge(
                                                              ~~~~~~^
           name,
           ^^^^^
       ...<3 lines>...
           description,
           ^^^^^^^^^^^^
       )
       ^
     File 
"/usr/local/lib/python3.13/site-packages/opentelemetry/sdk/metrics/_internal/instrument.py",
 line 64, in __init__
       raise Exception(_ERROR_MESSAGE.format(name))
   Exception: Expected ASCII string of maximum length 63 characters but got 
airflow.dag_processing.last_run.seconds_ago.PBI_SKU_Performance copy
   ```
   
   **Note:** The OTel SDK error message says "63 characters", but this is a 
stale message - the actual OTel SDK name regex enforces 255 characters. The 
real reason the name is rejected is the **space character**, which is not a 
valid OTel instrument name character.
   
   ### How to reproduce
   1. Enable OTel metrics on an Airflow 3.x deployment.
   2. Place a DAG file with a space in its name in the DAGs folder:
   dags/
   └── PBI_SKU_Performance copy.py
   3. Start the DagProcessor or scheduler.
   4. Observe the crash on every loop iteration in `_log_file_processing_stats`.
   
   
   ### What you think should happen instead?
   
   `SafeOtelLogger.gauge()` should silently skip the metric (log a warning and 
return) when the stat name is not OTel-safe, exactly the same way `incr()`, 
`decr()`, and `timing()` already do. The DagProcessor must never crash due to a 
metric emission failure.
   
   
   ### Operating System
   
   _No response_
   
   ### Deployment
   
   Astronomer
   
   ### Apache Airflow Provider(s)
   
   _No response_
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Official Helm Chart version
   
   Not Applicable
   
   ### Kubernetes Version
   
   _No response_
   
   ### Helm Chart configuration
   
   _No response_
   
   ### Docker Image customizations
   
   _No response_
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to