potiuk edited a comment on issue #14396:
URL: https://github.com/apache/airflow/issues/14396#issuecomment-785731859


   > First off great challenge :) , 2nd I'd probably replace my earlier 
suggestion with the following as well
   
   Yeah. I was expecting exactly this answer :).  So summarizing - what you've 
done now, you created hybrid dataclass and dictionary put together. Now, my 
goal is to show you what further consequences you have to deal with.
   
   First of all this "hybrid" solution will be around for years. We will not 
get rid of it in 2.* - we have to wait for 3.0.0 release (which we did not even 
start thinking about). We started rigorously following SEMVER and we cannot 
remove any "public API" behaviour till 3.0. Context is probably the most 
"public" API of Airflow you can imagine.. There are hundreds of thousands 
custom operators our there that are using this "public API".  This means that 
any change we introduce now is  going to be around at least a year from now. 
And until then 1.10 is still there as well (and will be there for quite some 
time). So if someone develops custom operators for their 1.10 Airflow, they 
will still use dictionary - so we have probably tens of thousands custom 
operators created still using the 'Dictionary' context for another year or two. 
   
   This is as a bit of context (!). I try to think empathically about our 
users. As someone who cares not only about 'purity' of the implemenation I have 
to think also about the adoption of Airlfow and years of maintenance.
   
   I am sure you realize that, but the example I've shown you above is the 
simplest form. Context is shared in a few places with custom operators:
   
   * pre_execute(context) 
   * post_executer(context)
   * execute(context)
   * get_current_context()
   
   Most likely many of the currently released operators are using the context 
to pass data (as custom dictionary values) between those methods - one can set 
a custom value in pre_execute() and retrieve it in execute() or post_execute() 
reads whatever execute sets in the context. It was easy to use, we have not 
forbidden it, it is part of the API (this is the basic "property" of dictionary 
- unlike dataclass - that you can set any value with any key there). By 
introducing Dataclass we are breaking this property. You will not be able to 
set arbitrary key in the context in `pre_execute` so that it is available in 
`execute`. If we implement the interrim (lasting at least a year or more) 
hybrid dataclass <-> dictionary proposed above, this will continue to work but 
with deprecation warnings.
   
   If we decide this is the right way to go, we have to communicate it to the 
users, and prepare them for the migration and give them the tools necessary to 
migrate when it comes to 3.0.0. Our goal is (as it was in 2.0.0 migration) to 
make it as smooth as possible for the migration. We even developed 
upgrade_check script that users can run on their installation which will tell 
them what needs to be fixed: 
https://airflow.apache.org/docs/apache-airflow/stable/upgrade-check.html
   
   I imagine the same will be with 3.0.0. I imagine, that we prepare another 
upgrade check. Imagine all the different scenarios people will have by then:
   1) Custom operators in DAGs
   2) Custom operators as Plugins
   3) Custom operators installed as custom providers (this has been added in 
2.0 - we want to encourage people to build their own providers and distribute 
them as PyPI packages).
   4) Localy installed custom operators as part of python environment/images 
users will have.
   
   So my challeng is this - propose a strategy that will help to migrate all 
those cases by our users - in the way that will not make them refrain from 
migration to 3.0.0 with the Dataclass context. What checks are you going to 
use, how are you going to encourage the users to migrate all the thousands of 
DAGs they have, what tools you are going to provide them.
   
   And once you do that, I will ask you and other committers (undoubtedly 
looking at that conversation) to answer single question:
   
   * Is it worth the hassle if we can achieve the very same user experience by 
using TypeDict ? 
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to