potiuk edited a comment on pull request #19637:
URL: https://github.com/apache/airflow/pull/19637#issuecomment-971797915


   > It compounds to 30 mins -- when you account for other DB migrations. The 
larger the number of DAGs, larger the TIs, more time for other migrations and 
more time for re-serialization
   
   But isn't that the case that just "clearing" the serialized dags is 
equivalent to emptying the cache? I do not think just **cleaning** the 
serialized fields will take a lot of time in any sizeable database - it's just 
marking the fields as empty which is mostly almost no-op. It's the compound 
time of re-serializing that will take some time.
   
   Not sure if that's the case and what are all consequences of such an 
approach.
   
   I believe when upgrade is happening, when we end the migration with 
"cleaning" the serialized dags, Dag File processor  will simply start 
processing and serializing dags pretty much almost "as usual" - initially a bit 
slower but this will be almost unnoticeable except that the dags that are not 
serialized yet, the tasks will not show in the UI. 
   And for those DAGs that are already processed the tasks will start to 
re-appear in the UI.
   
   Am I correct? Or are there any other side effects?
   
   Another approach here is to simply mark those all serialized dags as invalid 
- so that they stay present in the DB for the UI and dag file processor will 
reserialize them all while parsing - then even the "disappearing UI Dags". This 
is equivalent to "cache invalidation" rather than cleaning and maybe that's the 
right solution.
   
   > We should instead make the deserialization upgrade in place, which we 
already do for most cases.
   
   IMHO marking all dags as "invalid" at each upgrade  is far more "resilent" 
approach than reserialization looking also at the cases we had. We've 
introduced "accidental" incompatibilities in serialisation and there is no 
guarantee it won't happen again (and we have no protection/tests preventing it 
from it happening again), so "reserialize all at upgrade" for me is an easy 
solution that helps us dealing with accidental mistakes we can (and will) make. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to