kacpermuda commented on code in PR #57809: URL: https://github.com/apache/airflow/pull/57809#discussion_r2494317910
########## providers/openlineage/docs/guides/user.rst: ########## @@ -478,6 +478,51 @@ You can enable this automation by setting ``spark_inject_transport_info`` option AIRFLOW__OPENLINEAGE__SPARK_INJECT_TRANSPORT_INFO=true +Passing parent information to Airflow DAG +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +To enable full OpenLineage lineage tracking across dependent DAGs, you can pass parent and root job information +through the DAG's ``dag_run.conf``. When a DAG run configuration includes an ``_openlineage`` section with valid metadata, Review Comment: > Do we ever would want user to fill this out themselves? If no, we can keep underscore. Maybe when triggering DagRun through API from some external tool? I like the underscore, but no hard stop on this, if you feel like it's better without. > Would there be a mechanism in API or other external trigger mechanism to pass through this info as well? Not sure what you're asking exactly, but in API when triggering dagrun you can already pass conf, so this will work. If you're asking about some automated way to do this, I'm not sure if there is some triggerdagrunwithapi operator, if there is I can instrument it. If not, OL provider will expose a function that will create this conf for the user in an automated way, to facilitate creation of this ids from taskinstance for the users ########## providers/openlineage/docs/guides/user.rst: ########## @@ -478,6 +478,51 @@ You can enable this automation by setting ``spark_inject_transport_info`` option AIRFLOW__OPENLINEAGE__SPARK_INJECT_TRANSPORT_INFO=true +Passing parent information to Airflow DAG +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +To enable full OpenLineage lineage tracking across dependent DAGs, you can pass parent and root job information +through the DAG's ``dag_run.conf``. When a DAG run configuration includes an ``_openlineage`` section with valid metadata, +this information is automatically parsed and converted into DAG's **ParentRunFacet**, from which the root information +is also propagated to all task runs. If no DAG run configuration is provided, the DAG run itself is considered the +lineage root for its tasks. + +The ``_openlineage`` dict in conf can contain the following keys: Review Comment: I think those two statements contradict each othe, os just to clarify we can either: 1. require all six, and only create a parentrunfacet when all parent and root ids are present (not a good idea imo) 2. when 3 parent ids are present, use parent as parent and root, when all 6 are present, use them. Only create parentfacet, when at least 3 parent ids are present and if only root is present, do not create parent run facet. I'll go with 2, as I think this is what we both agree on. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
