potiuk commented on issue #25254:
URL: https://github.com/apache/airflow/issues/25254#issuecomment-1193143649

   I am not sure if keeping it in airflow MetaData DB makes sense, This will 
put ENORMOUS pressure on the database. We are not going to use it in other 
parts of the MetaData DB. for anything else - just to "dump" the information. 
   
   Since we are going to make most airflow components DB-less (see 
https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API)
 that will also put additional pressure on those components to have more 
communication overhead to write such database enttry.
   
   I think (but I will not close that one yet) this one, similarly to #25252  
has much better potential when implemented as part of our OpenTelemetry effort 
(which has already been approved and voted on 
https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-49+OpenTelemetry+Support+for+Apache+Airflow)
 - we are talking about gathering much more of information from Airflow via 
standard telemetry interfaces, so storing them in the MetaDataDB as opposed to 
keep them in external systems that are supposed to manage system telemetry and 
be able to track various kind of telemetry (including traces which are the best 
matching part of the Open-Telemetry proposa) are much better choice IMHO.
   
   I'd say any "database" entries here that we need should rather keep track fo 
changes of the DAG structure (which should be part of another AIP 
https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-36+DAG+Versioning  and 
this is the type of information that should be stored in Airflow Metadata. This 
is because such versioning can be used by Airlfow itself to make decision 
(back-filling). 
   
   Following this analogy, I personally think such table of state change would 
only make sense if we are going to use it for something else. For example IF 
such a table (or similar) would be a side-effect of implementing SLA feature 
"properly" then yeah - we could consider that as part of Airlfow Metadata. But 
if the only reason is to "track the history of changes by human", then we 
simply try to implement into Airlfow what Telemetry systems are doing way 
better than any of our implementations can be and we should rather focus on 
making sure our OpenTelemetry integration allows for it rather than trying to 
replicate it in-airflow.
   
   This is what I think, but I am curious what others think about it.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to