rahul-madaan commented on code in PR #64513:
URL: https://github.com/apache/airflow/pull/64513#discussion_r3025949832


##########
providers/openlineage/src/airflow/providers/openlineage/utils/spark.py:
##########
@@ -195,3 +195,70 @@ def 
inject_transport_information_into_spark_properties(properties: dict, context
         return properties
 
     return {**properties, **_get_transport_information_as_spark_properties()}
+
+
+def inject_parent_job_information_into_glue_arguments(script_args: dict, 
context: Context) -> dict:
+    """
+    Inject parent job information into Glue job arguments if not already 
present.
+
+    Glue jobs pass Spark properties via the ``--conf`` key in the script_args 
dict.
+    Multiple Spark conf properties are combined into the ``--conf`` key value 
with
+    ``' --conf '`` as separator between each property assignment.
+
+    Args:
+        script_args: Glue job script arguments dict (maps to boto3 
``Arguments``).
+        context: The context containing task instance information.
+
+    Returns:
+        Modified script_args dict with OpenLineage parent job information 
injected, if applicable.
+    """
+    existing_conf = script_args.get("--conf", "")
+
+    if "spark.openlineage.parent" in existing_conf:

Review Comment:
   "spark.openlineage.parent" already matches "spark.openlineage.rootParent" 
because it's a substring check — "parent" is contained in
     "rootParent".
   
     >>> "spark.openlineage.parent" in "spark.openlineage.rootParentRunId=abc"
     True 
     
   The existing code already handles this correctly, and it matches the exact 
same pattern used in the Spark properties version:                              
                                                                                
            
     
   Line 140 shows the Spark properties version uses the same prefix 
"spark.openlineage.parent" — which covers both parentJobName and 
rootParentRunId etc. My Glue version at line
      217 follows the identical pattern.
   



##########
providers/openlineage/src/airflow/providers/openlineage/utils/spark.py:
##########
@@ -195,3 +195,70 @@ def 
inject_transport_information_into_spark_properties(properties: dict, context
         return properties
 
     return {**properties, **_get_transport_information_as_spark_properties()}
+
+
+def inject_parent_job_information_into_glue_arguments(script_args: dict, 
context: Context) -> dict:
+    """
+    Inject parent job information into Glue job arguments if not already 
present.
+
+    Glue jobs pass Spark properties via the ``--conf`` key in the script_args 
dict.
+    Multiple Spark conf properties are combined into the ``--conf`` key value 
with
+    ``' --conf '`` as separator between each property assignment.
+
+    Args:
+        script_args: Glue job script arguments dict (maps to boto3 
``Arguments``).
+        context: The context containing task instance information.
+
+    Returns:
+        Modified script_args dict with OpenLineage parent job information 
injected, if applicable.
+    """
+    existing_conf = script_args.get("--conf", "")
+
+    if "spark.openlineage.parent" in existing_conf:

Review Comment:
   "spark.openlineage.parent" already matches "spark.openlineage.rootParent" 
because it's a substring check — "parent" is contained in
     "rootParent".
   
     >"spark.openlineage.parent" in "spark.openlineage.rootParentRunId=abc"
     True 
     
   The existing code already handles this correctly, and it matches the exact 
same pattern used in the Spark properties version:                              
                                                                                
            
     
   Line 140 shows the Spark properties version uses the same prefix 
"spark.openlineage.parent" — which covers both parentJobName and 
rootParentRunId etc. My Glue version at line
      217 follows the identical pattern.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to