Copilot commented on code in PR #64513:
URL: https://github.com/apache/airflow/pull/64513#discussion_r3025333635


##########
providers/amazon/pyproject.toml:
##########
@@ -60,7 +60,7 @@ requires-python = ">=3.10"
 # After you modify the dependencies, and rebuild your Breeze CI image with 
``breeze ci-image build``
 dependencies = [
     "apache-airflow>=2.11.0",
-    "apache-airflow-providers-common-compat>=1.13.0",
+    "apache-airflow-providers-common-compat>=1.13.0",  # use next version

Review Comment:
   The operator now imports `inject_*_into_glue_arguments` from 
`apache-airflow-providers-common-compat`, but the minimum dependency is still 
`>=1.13.0`. Users with common-compat 1.13.x installed will hit an ImportError 
when importing the Glue operator because these symbols won’t exist. Please bump 
the minimum required `apache-airflow-providers-common-compat` to the first 
released version that includes these new helpers (i.e. the next common-compat 
release after this change).
   ```suggestion
       "apache-airflow-providers-common-compat>=1.14.0",  # use next version
   ```



##########
providers/openlineage/src/airflow/providers/openlineage/utils/spark.py:
##########
@@ -195,3 +195,70 @@ def 
inject_transport_information_into_spark_properties(properties: dict, context
         return properties
 
     return {**properties, **_get_transport_information_as_spark_properties()}
+
+
+def inject_parent_job_information_into_glue_arguments(script_args: dict, 
context: Context) -> dict:
+    """
+    Inject parent job information into Glue job arguments if not already 
present.
+
+    Glue jobs pass Spark properties via the ``--conf`` key in the script_args 
dict.
+    Multiple Spark conf properties are combined into the ``--conf`` key value 
with
+    ``' --conf '`` as separator between each property assignment.
+
+    Args:
+        script_args: Glue job script arguments dict (maps to boto3 
``Arguments``).
+        context: The context containing task instance information.
+
+    Returns:
+        Modified script_args dict with OpenLineage parent job information 
injected, if applicable.
+    """
+    existing_conf = script_args.get("--conf", "")
+
+    if "spark.openlineage.parent" in existing_conf:

Review Comment:
   The idempotence check only looks for `spark.openlineage.parent` in the 
existing `--conf` string, but the injected parent facet also includes 
`spark.openlineage.rootParent*` keys. If a user already sets only `rootParent*` 
properties, this function will still inject another set (creating duplicates 
and potentially overriding values). Consider treating both 
`spark.openlineage.parent` and `spark.openlineage.rootParent` as “already 
present”, or parsing existing `--conf` to detect any of the specific keys you 
inject.
   ```suggestion
       if "spark.openlineage.parent" in existing_conf or 
"spark.openlineage.rootParent" in existing_conf:
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to