potiuk commented on PR #23860:
URL: https://github.com/apache/airflow/pull/23860#issuecomment-1154862531

   > With the current changes in this PR, existing serialised DAGs will fail to 
de-serialise. But as mentioned above, re-serialisation during upgrade has 
performance issues.
   > 
   > Since this only affects a very specific subset of DAGs, perhaps it makes 
sense to pick them up during `db upgrade` and automatically re-serialise them? 
Users should already be aware that the upgrade command can take significant 
time, so noting the possibility in the upgrade notes should make the approach 
reasonable.
   
   I am only afraid that we will simply miss a number of cases (we already 
did). When I proposed it last time, the comment was "but those serialisation 
problems will not happen again, we solved them" essentially. And since then I 
saw at least three cases where some of our changes caused unexpected 
serialization issues, because we have not expected they will have impaact. And 
this is not a complaint at all - simply those things happen, we are humans, we 
make mistakes or even simply our reasoning does not go enough steps forward to 
foresee all consequences. When I see the potential place where human makes 
mistake, my first thought is not "who to blame" but "what should we do to 
prevent this mistake from ever happening or having any impact". It's extremely 
difficult to reason about impact of some of our changes on serialization <-> 
deserialization. If we keep on evolving Airflow, those mistakes are bound to 
happen. For me this is a trap we set on others and future selfs, that will be fi
 ring at us from time to time. Why don't we remove the trap altogether?
   
   The reason I proposed re-serialisation at upgrade is just this is classic 
"preventive" rather than "reactive" action, that frees us from:
   
   * reasoning about serialization issues impact (freeing our mind cycles and 
daily decision pool for more interesting and important problems)
   * frustration of our users
   * having to handle issues opened and getting even more frustrated 
   
   I believe rebuilding cache at upgrade is quite reasonable thing to do and - 
depending on how long it actually takes - it might be a vere resonable price to 
pay for peace of mind. And our serialization is basically this - sophisticated 
caching mechanism.
   
   Just saying - I am not extremely strong about it, but when developing 
product, and I see an opportunity to eliminate a problem once for-ever, no 
matter how many future changes we do, I usually opt for it (depending of course 
on the performance penalty our users will have to pay).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to