potiuk edited a comment on pull request #19637: URL: https://github.com/apache/airflow/pull/19637#issuecomment-971797915
> It compounds to 30 mins -- when you account for other DB migrations. The larger the number of DAGs, larger the TIs, more time for other migrations and more time for re-serialization But isn't that the case that just "clearing" the serialized dags is equivalent to emptying the cache? I do not think just **cleaning** the serialized fields will take a lot of time in any sizeable database - it's just marking the fields as empty which is mostly almost no-op. It's the compound time of re-serializing that will take some time. Not sure if that's the case and what are all consequences of such an approach. I believe when upgrade is happening, when we end the migration with "cleaning" the serialized dags, Dag File processor will simply start processing and serializing dags pretty much almost "as usual" - initially a bit slower but this will be almost unnoticeable except that the dags that are not serialized yet, the tasks will not show in the UI. And for those DAGs that are already processed the tasks will start to re-appear in the UI. Am I correct? Or are there any other side effects? Another approach here is to simply mark those all serialized dags as invalid - so that they stay present in the DB for the UI and dag file processor will reserialize them all while parsing - then even the "disappearing UI Dags". This is equivalent to "cache invalidation" rather than cleaning and maybe that's the right solution. > We should instead make the deserialization upgrade in place, which we already do for most cases. IMHO marking all dags as "invalid" at each upgrade is far more "resilent" approach than reserialization looking also at the cases we had. We've introduced "accidental" incompatibilities in serialisation and there is no guarantee it won't happen again (and we have no protection/tests preventing it from it happening again), so "reserialize all at upgrade" for me is an easy solution that helps us dealing with accidental mistakes we can (and will) make. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
