potiuk commented on PR #28900:
URL: https://github.com/apache/airflow/pull/28900#issuecomment-1539872479

   > I updated the PR and fixed some imports. However, tests are still 
complaining about methods defined in airflow/models/taskinstance.py but not in 
airflow/models/pydantic/taskinstance.py? 
https://github.com/apache/airflow/pull/28900#issuecomment-1462646719. Has the 
approach to solve it changed since 
https://github.com/apache/airflow/pull/28900#issuecomment-1465238532 @potiuk? I 
should have definitely tried to keep up with this (de)serialization topic :|
   
   Nope. Hasn't changed - we need now to solve all the missing pieces. This is 
precisely one of the things that MyPy and *Pydantic should guide us to solve. 
we should refactor the the code in the way to make sure tha on the client side 
(i.e. when *Pydantic version is used) we are not using anything that is NOT 
available in the Pydantic object. 
   
   That migth mean:
   
   * adding and making sure that attributes  of TaskPydantic are serialized if 
they are not ORM attributes
   * adding parallel method in TaskPydantic if they are used on the client 
side, so that effect of calling those methods will be equivalent of running 
them on the server side. This might mean that for example we extrac the method 
to a module level and turn it into function taking (TaskInstance | 
TaskInstancePydantic  as parameter and call it appropriately in the right 
object. Or if the method is supposed to make a db call, it should make an 
@internal_api call to make a change and pass enough parameters to make it 
succeed.
   
   Basically what we are trying to do now with the aid of MyPy (And later we 
will verify it with automated test verification) is to make the users of (so 
far) ORM classes using the Pydantic versions of those classes woudl continue 
doing so without any modifications to the APIs they called, while we would 
route the calls to either direct DB access or @internal_api call - depending 
which variant of the object they have. In most cases (the read only) - there 
won't be a need to make an @internal-api call. What we will need is merely to 
extract the method and make it ise ORM or Pydantic attributes to calculate 
stuff and return it (there are quite a few methods like that in some of the ORM 
objects we have). In case we need to refresh stuff from DB (very few if any) or 
update it in the DB (There will be few methods) we should make a conditional 
@internal_api call, sometimes we will have to implement conditional logic 
mostly for optimisations (if Pydantic -> make a remote call, if not -> jus
 t do what you did before).
   
   Generally speaking - what we need to do now is to make sure that whoever was 
the user of those ORM clasess will continue using them the same way as they did 
in the past - without even knowing they are executing an internal/DB call 
under-the-hood. With as minimum duplication (mainly around implementing the 
method and calling module method) as possible.
   
   This is - unfortunately - a side effect of the fact that our ORM objects 
have been previously exposed to the developers of the DAGs via Context in 
operators/tasks. Those developers are entitled to use them and even call some 
of the methods of those objects via JINJA templates and taskflow parameters - 
and those calls should remain backwards compatible. TaskInstance is the most 
prominent example of this. The first step is to make sure that we do that with 
all the "internal" calls, and the second step will be to get all the remaining 
methods of those ORM objects (if there will be any left) and make them 
pass-through internal-api call in the same way (my assumption is that once we 
do it for all our code including tests, there will not be any left).
   
   The whole thing about AIP-44 implementation is that we are promising the 
users that they will not have change their DAGs IF they are not using any DB 
direct access but just use what we have in 
https://airflow.apache.org/docs/apache-airflow/stable/public-airflow-interface.html
 . This was one of the reasons why I explicitly wanted to list what is public 
interface - in a way that was a preparation of what to tell users when they 
swich to AIP-44. Context - historically - has been a public interface, and 
(historically again) it contained some ORM objects. This is also why that was 
one of the first things I've done when added Pydantic variants:
   
   If you look at 
https://github.com/apache/airflow/blob/main/airflow/utils/context.pyi  - you 
will see the typing and all the ORM classes exposed via this typing have now `| 
Pydantic`. This is also information for the users that those attributes and 
methods that are available in Pydantic are "safe" to use by them in their 
operators. And now our job is to add those attributes and methods, that are 
"safe" and also that will provide maximum backwards compatibility for the 
current users. 
   
   I hope this will make things cleaner, and if not - I am happy to have a call 
also with @mhenc to discuss it and explain and  get more feedback :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to