potiuk commented on PR #23860: URL: https://github.com/apache/airflow/pull/23860#issuecomment-1154862531
> With the current changes in this PR, existing serialised DAGs will fail to de-serialise. But as mentioned above, re-serialisation during upgrade has performance issues. > > Since this only affects a very specific subset of DAGs, perhaps it makes sense to pick them up during `db upgrade` and automatically re-serialise them? Users should already be aware that the upgrade command can take significant time, so noting the possibility in the upgrade notes should make the approach reasonable. I am only afraid that we will simply miss a number of cases (we already did). When I proposed it last time, the comment was "but those serialisation problems will not happen again, we solved them" essentially. And since then I saw at least three cases where some of our changes caused unexpected serialization issues, because we have not expected they will have impaact. And this is not a complaint at all - simply those things happen, we are humans, we make mistakes or even simply our reasoning does not go enough steps forward to foresee all consequences. When I see the potential place where human makes mistake, my first thought is not "who to blame" but "what should we do to prevent this mistake from ever happening or having any impact". It's extremely difficult to reason about impact of some of our changes on serialization <-> deserialization. If we keep on evolving Airflow, those mistakes are bound to happen. For me this is a trap we set on others and future selfs, that will be fi ring at us from time to time. Why don't we remove the trap altogether? The reason I proposed re-serialisation at upgrade is just this is classic "preventive" rather than "reactive" action, that frees us from: * reasoning about serialization issues impact (freeing our mind cycles and daily decision pool for more interesting and important problems) * frustration of our users * having to handle issues opened and getting even more frustrated I believe rebuilding cache at upgrade is quite reasonable thing to do and - depending on how long it actually takes - it might be a vere resonable price to pay for peace of mind. And our serialization is basically this - sophisticated caching mechanism. Just saying - I am not extremely strong about it, but when developing product, and I see an opportunity to eliminate a problem once for-ever, no matter how many future changes we do, I usually opt for it (depending of course on the performance penalty our users will have to pay). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
