Shaan-alpha commented on code in PR #66886:
URL: https://github.com/apache/airflow/pull/66886#discussion_r3238643742


##########
providers/databricks/src/airflow/providers/databricks/operators/databricks_sql.py:
##########
@@ -46,6 +46,46 @@
 _DISALLOWED_SQL_TOKENS = (";", "--", "/*", "*/")
 
 
+def _escape_query_tag_value(value: str) -> str:
+    """Escape Databricks query-tag separators in a tag value."""
+    return str(value).replace("\\", "\\\\").replace(",", "\\,").replace(":", 
"\\:")
+
+
+def _format_query_tags(context: Context) -> str:
+    """Format Airflow context metadata into databricks-sql-connector query 
tags."""
+    tags = []
+    if "dag" in context and getattr(context["dag"], "dag_id", None):
+        
tags.append(f"airflow_dag_id:{_escape_query_tag_value(context['dag'].dag_id)}")
+    if "task" in context and getattr(context["task"], "task_id", None):
+        
tags.append(f"airflow_task_id:{_escape_query_tag_value(context['task'].task_id)}")
+    if "run_id" in context and context["run_id"]:
+        
tags.append(f"airflow_run_id:{_escape_query_tag_value(context['run_id'])}")
+
+    return ",".join(tags)

Review Comment:
   Thanks for the suggestion @SameerMesiah97! I agree the mapping-driven 
approach is cleaner and more maintainable. I've gone ahead and refactored both 
`_format_query_tags` and `_escape_query_tag_value` to use the 
`_QUERY_TAG_FIELDS` and `_QUERY_TAG_ESCAPE_SEQUENCES` mappings exactly as you 
suggested — this is included in commit 732df03 ("Refactor Databricks query tag 
helper utilities"). Please take another look and let me know if you'd like any 
further tweaks.



##########
providers/databricks/src/airflow/providers/databricks/operators/databricks_sql.py:
##########
@@ -153,6 +193,10 @@ def _hook(self) -> DatabricksSqlHook:
     def get_db_hook(self) -> DatabricksSqlHook:
         return self._hook
 
+    def execute(self, context: Context) -> Any:
+        _inject_query_tags(self.get_db_hook(), context)
+        return super().execute(context)

Review Comment:
   Good point — making this configurable makes sense so users retain explicit 
control over `session_configuration`. I'll add an `inject_query_tags: bool = 
True` parameter to both `DatabricksSqlOperator` and 
`DatabricksCopyIntoOperator` (defaulting to `True` to preserve the 
observability benefit, while allowing easy opt-out). I'll also document the new 
parameter in the operator docstrings. Will push the update shortly.



##########
providers/databricks/tests/unit/databricks/operators/test_databricks_sql.py:
##########
@@ -453,3 +453,32 @@ def test_parse_gcs_path():
     bucket, object_name = 
op._parse_gcs_path("gs://my-bucket/path/to/file.parquet")
     assert bucket == "my-bucket"
     assert object_name == "path/to/file.parquet"
+
+

Review Comment:
   Will do — I'll move the new test to sit right after 
`test_exec_write_gcs_parquet_output` and mirror the same expanded coverage 
(empty/partial context, empty existing `query_tags`, escape-character values, 
and preservation of unrelated `session_configuration` keys). Thanks for the 
review!



##########
providers/databricks/tests/unit/databricks/operators/test_databricks_copy.py:
##########
@@ -522,3 +522,34 @@ def test_get_openlineage_facets():
         "externalQuery": ExternalQueryRunFacet(externalQueryId="query_id", 
source="scheme://host")
     }
     assert result.job_facets == {"sql": SQLJobFacet(query=op._sql)}
+
+

Review Comment:
   Thanks, that ordering makes sense. I'll move `test_query_tags_injection` 
above the OpenLineage tests so it sits closer to the core execution logic. I'll 
also expand coverage to include: empty/partial context (missing 
`dag`/`task`/`run_id`), empty existing `query_tags`, values containing 
commas/colons/backslashes (to exercise `_escape_query_tag_value` end-to-end), 
and an assertion that unrelated keys in `session_configuration` are preserved 
after the merge. Will include in the next push.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to