mwoods-familiaris opened a new issue, #44250:
URL: https://github.com/apache/airflow/issues/44250

   ### Apache Airflow Provider(s)
   
   databricks
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-databricks==6.13.*
   
   ### Apache Airflow version
   
   2.10.2
   
   ### Operating System
   
   Debian GNU/Linux 12 (bookworm)
   
   ### Deployment
   
   Astronomer
   
   ### Deployment details
   
   _No response_
   
   ### What happened
   
   _get_databricks_task_id only cleanses the task id, ref:
   
https://github.com/apache/airflow/blob/a9242844706ca117f86d22092109939dd56435ee/providers/src/airflow/providers/databricks/plugins/databricks_workflow.py#L67
   
https://github.com/apache/airflow/blob/a9242844706ca117f86d22092109939dd56435ee/providers/src/airflow/providers/databricks/operators/databricks.py#L1077
   
   However, the dag_id may also contain `.` - so the replacement of `.` with 
`__` should be applied to the whole string, not just the task id portion, else 
periods placed in the dag name results in errors such as:
   ```
   [2024-11-21, 13:12:42 GMT] {taskinstance.py:3310} ERROR - Task failed with 
exception
   Traceback (most recent call last):
     File 
"/usr/local/lib/python3.11/site-packages/airflow/models/taskinstance.py", line 
767, in _execute_task
       result = _execute_callable(context=context, **execute_callable_kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/usr/local/lib/python3.11/site-packages/airflow/models/taskinstance.py", line 
733, in _execute_callable
       return ExecutionCallableRunner(
              ^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/usr/local/lib/python3.11/site-packages/airflow/utils/operator_helpers.py", 
line 252, in run
       return self.func(*args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/usr/local/lib/python3.11/site-packages/airflow/models/baseoperator.py", line 
406, in wrapper
       return func(self, *args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/usr/local/lib/python3.11/site-packages/airflow/providers/databricks/operators/databricks.py",
 line 1252, in execute
       self.monitor_databricks_job()
     File 
"/usr/local/lib/python3.11/site-packages/airflow/providers/databricks/operators/databricks.py",
 line 1203, in monitor_databricks_job
       current_task_run_id = self._get_current_databricks_task()["run_id"]
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/usr/local/lib/python3.11/site-packages/airflow/providers/databricks/operators/databricks.py",
 line 1165, in _get_current_databricks_task
       return {task["task_key"]: task for task in sorted_task_runs}[
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   KeyError: 'my.airflow.dag.with.periods__my_airflow_task'
   ```
   (as the invalid chars are getting silently stripped by databricks, so the 
task key on the databricks side is `myairflowdagwithperiods__my_airflow_task` 
rather than `my.airflow.dag.with.periods__my_airflow_task`)
   
   ### What you think should happen instead
   
   The replacement of `.` with `__` should be applied to the whole task key / 
run name string, not just the task id portion
   
   ### How to reproduce
   
   Use the affected operator(s) e.g. DatabricksNotebookOperator on a DAG which 
contains `.` in the dag_id
   
   ### Anything else
   
   Every time
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to