potiuk commented on issue #14396: URL: https://github.com/apache/airflow/issues/14396#issuecomment-785731859
> First off great challenge :) , 2nd I'd probably replace my earlier suggestion with the following as well Yeah. I was expecting exactly this answer :). So summarizing - what you've done now, you created hybrid dataclass and dictionary put together. Now, my goal is to show you what further consequences you have to deal with. First of all this "hybrid" solution will be around for years. We will not get rid of it in 2.* - we have to wait for 3.0.0 release (which we did not even start thinking about). We started rigorously following SEMVER and we cannot remove any "public API" behaviour till 3.0. Context is probably the most "public" API of Airflow you can imagine.. There are hundreds of thousands custom operators our there that are using this "public API". This means that any change we introduce now is going to be around at least a year from now. And until then 1.10 is still there as well (and will be there for quite some time). So if someone develops custom operators for their 1.10 Airflow, they will still use dictionary - so we have probably tens of thousands custom operators created still using the 'Dictionary' context for another year or two. This is as a bit of context (!). I try to think empathically about our users. As someone who cares not only about 'purity' of the implemenation I have to think also about the adoption of Airlfow and years of maintenance. I am sure you realize that, but the example I've shown you above is the simplest form. Context is shared in a few places with custom operators: * pre_execute(context) * post_executer(context) * execute(context) * get_current_context() Most likely many of the currently released operators are using the context to pass data (as custom dictionary values) between those methods - one can set a custom value in pre_execute() and retrieve it in execute() or post_execute() reads whatever execute sets in the context. It was easy to use, we have not forbidden it, it is part of the API (this is the basic "property" of dictionary - unlike dataclass - that you can set any value with any key there). By introducing Dataclass we are breaking this property. You will not be able to set arbitrary key in the context in `pre_execute` so that it is available in `execute`. If we implement the interrim (lasting at least a year or more) hubrid dataclass <-> dictionary proposed above, this will continue to work but with deprecation warnings. If we decide this is the right way to go, we have to communicate it to the users, and prepare them for the migration and give them the tools necessary to migrate when it comes to 3.0.0. Our goal is (as it was in 2.0.0 migration) to make it as smooth as possible for the migration. We even developed upgrade_check script that users can run on their installation which will tell them what needs to be fixed: https://airflow.apache.org/docs/apache-airflow/stable/upgrade-check.html I imagine the same will be with 3.0.0. I imagine, that we prepare another upgrade check. Imagine all the different scenarios people will have by then: 1) Custom operators in DAGs 2) Custom operators as Plugins 3) Custom operators installed as custom providers (this has been added in 2.0 - we want to encourage people to build their own providers and distribute them as PyPI packages). 4) Localy installed custom operators as part of python environment/images users will have. So my challeng is this - propose a strategy that will help to migrate all those cases by our users - in the way that will not make them refrain from migration to 3.0.0 with the Dataclass context. What checks are you going to use, how are you going to encourage the users to migrate all the thousands of DAGs they have, what tools you are going to provide them. And once you do that, I will ask you and other committers (undoubtedly looking at that conversation) to answer single question: * Is it worth the hassle if we can achieve the very same user experience by using TypeDict ? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
