kacpermuda opened a new issue, #39467:
URL: https://github.com/apache/airflow/issues/39467

   ### Description
   
   I would like to add a new facet that would allow re-creation of DAG graph 
from OL events. It could look like in my example, but if anybody has any ideas 
on how to improve it, i would love to hear it.
   
   The facet could look like:
   ```python
   @define(slots=False)
   class AirflowDagJobFacet(BaseFacet):
       taskTree: dict
       taskGroups: dict
       tasks: dict
   ```
   
   where `taskGroups` could look like:
   ```python
   {
       "tg1": {
           "parentGroup": "tg2"
       },
       "tg2": {
           "parentGroup": "tg3"
       },
       "tg3": {},
   }
   ```
   
   `tasks` could look like:
   ```python
   {
       "tg1.task.id.1": {
           "operator": "BashOperator",
           "task_group": "tg1"
       },
       "task.id.with.dots.2": {
           "operator": "EmptyOperator",
           "task_group": "tg2"
       },
       "task_3": {
           "operator": "SomeCustomOperator",
       }
   }
   ```
   
   and the `taskTree` could look like:
   ```python
   # Example task dependency definition (minus the dots, it would raise the 
error but it's more readable)
   # tg1.task.id.1 >> task_3
   # tg1.task.id.1 >> task_4 >> task_5
   # task.id.with.dots.2
   # task_6 >> [task_7, task_8] >> task_9
   task_tree = {
       "tg1.task.id.1": {
           "task_3": {},
           "task_4": {
               "task_5": {}
           }
       },
       "task.id.with.dots.2": {},
       "task_6": {
           "task_7": {
               "task_9": {}  # yes, it would be duplicated becaues of [] 
dependency
           },
           "task_8": {
               "task_9": {}  # yes, it would be duplicated becaues of [] 
dependency
           },
       }
   }
   ```
   
   
   ### Use case/motivation
   
   Some OL consumers are trying to re-create DAG graph or simply deduce task 
dependencies from OL events. Currently it mainly works based on 
upstream/downstream task ids attributes delivered in AirflowRunFacet for each 
task. This approach has its downsides when using EmptyOperators (that may not 
be executed and thus won't deliver events) or Operators not included in 
OpenLineage (disabled using 
[disabled-for-operators](https://airflow.apache.org/docs/apache-airflow-providers-openlineage/stable/configurations-ref.html#disabled-for-operators)).
 The above make it impossible to create the dependency graph correctly.
   
   To make it easier for the consumers, I think we should provide all the 
necessary information to re-create the DAG graph in the DAG START event, and 
for that I would like to propose adding a new facet.
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to