mattusifer commented on PR #56663: URL: https://github.com/apache/airflow/pull/56663#issuecomment-3426191857
> Honestly I don’t personally feel this belongs to Airflow. @uranusjr I really appreciate you taking a look. Here is my thinking. For example, take the following case: - DAGs that run monthly barely create any data, and their owners would prefer to have 120 days of history so they can check on old DAG runs. This creates very little extra data in the DB. - DAGs that run every minute create tons of data, and their owners don't need to keep more than a week (or less) of history - With today's db_clean we have to take the max of the two requirements, and so we end up keeping 120d of history for DAGs that run every minute, which leaves a ton of unnecessary data in the database. Since the volume of data that is generated can vary so wildly between two different DAGs, and the cleanup requirements might also differ between DAGs, it would be ideal if we could apply db_clean differently on different DAGs. I think we can figure out how to do this on our own if this isn't a good general fit for Airflow, but it really seemed to me to make sense. Really interested to hear your thoughts. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
