kacpermuda commented on code in PR #57809:
URL: https://github.com/apache/airflow/pull/57809#discussion_r2494317910


##########
providers/openlineage/docs/guides/user.rst:
##########
@@ -478,6 +478,51 @@ You can enable this automation by setting 
``spark_inject_transport_info`` option
   AIRFLOW__OPENLINEAGE__SPARK_INJECT_TRANSPORT_INFO=true
 
 
+Passing parent information to Airflow DAG
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+To enable full OpenLineage lineage tracking across dependent DAGs, you can 
pass parent and root job information
+through the DAG's ``dag_run.conf``. When a DAG run configuration includes an 
``_openlineage`` section with valid metadata,

Review Comment:
   > Do we ever would want user to fill this out themselves? If no, we can keep 
underscore.
   
   Maybe when triggering DagRun through API from some external tool? I like the 
underscore, but no hard stop on this, if you feel like it's better without.
   
   > Would there be a mechanism in API or other external trigger mechanism to 
pass through this info as well?
   
   Not sure what you're asking exactly, but in API when triggering dagrun you 
can already pass conf, so this will work. If you're asking about some automated 
way to do this, I'm not sure if there is some triggerdagrunwithapi operator, if 
there is I can instrument it. If not, OL provider will expose a function that 
will create this conf for the user in an automated way, to facilitate creation 
of this ids from taskinstance for the users



##########
providers/openlineage/docs/guides/user.rst:
##########
@@ -478,6 +478,51 @@ You can enable this automation by setting 
``spark_inject_transport_info`` option
   AIRFLOW__OPENLINEAGE__SPARK_INJECT_TRANSPORT_INFO=true
 
 
+Passing parent information to Airflow DAG
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+To enable full OpenLineage lineage tracking across dependent DAGs, you can 
pass parent and root job information
+through the DAG's ``dag_run.conf``. When a DAG run configuration includes an 
``_openlineage`` section with valid metadata,
+this information is automatically parsed and converted into DAG's 
**ParentRunFacet**, from which the root information
+is also propagated to all task runs. If no DAG run configuration is provided, 
the DAG run itself is considered the
+lineage root for its tasks.
+
+The ``_openlineage`` dict in conf can contain the following keys:

Review Comment:
   I think those two statements contradict each othe, os just to clarify we can 
either:
   1. require all six, and only create a parentrunfacet when all parent and 
root ids are present (not a good idea imo)
   2. when 3 parent ids are present, use parent as parent and root, when all 6 
are present, use them. Only create parentfacet, when at least 3 parent ids are 
present and if only root is present, do not create parent run facet.
   
   I'll go with 2, as I think this is what we both agree on.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to