potiuk commented on PR #28900: URL: https://github.com/apache/airflow/pull/28900#issuecomment-1539872479
> I updated the PR and fixed some imports. However, tests are still complaining about methods defined in airflow/models/taskinstance.py but not in airflow/models/pydantic/taskinstance.py? https://github.com/apache/airflow/pull/28900#issuecomment-1462646719. Has the approach to solve it changed since https://github.com/apache/airflow/pull/28900#issuecomment-1465238532 @potiuk? I should have definitely tried to keep up with this (de)serialization topic :| Nope. Hasn't changed - we need now to solve all the missing pieces. This is precisely one of the things that MyPy and *Pydantic should guide us to solve. we should refactor the the code in the way to make sure tha on the client side (i.e. when *Pydantic version is used) we are not using anything that is NOT available in the Pydantic object. That migth mean: * adding and making sure that attributes of TaskPydantic are serialized if they are not ORM attributes * adding parallel method in TaskPydantic if they are used on the client side, so that effect of calling those methods will be equivalent of running them on the server side. This might mean that for example we extrac the method to a module level and turn it into function taking (TaskInstance | TaskInstancePydantic as parameter and call it appropriately in the right object. Or if the method is supposed to make a db call, it should make an @internal_api call to make a change and pass enough parameters to make it succeed. Basically what we are trying to do now with the aid of MyPy (And later we will verify it with automated test verification) is to make the users of (so far) ORM classes using the Pydantic versions of those classes woudl continue doing so without any modifications to the APIs they called, while we would route the calls to either direct DB access or @internal_api call - depending which variant of the object they have. In most cases (the read only) - there won't be a need to make an @internal-api call. What we will need is merely to extract the method and make it ise ORM or Pydantic attributes to calculate stuff and return it (there are quite a few methods like that in some of the ORM objects we have). In case we need to refresh stuff from DB (very few if any) or update it in the DB (There will be few methods) we should make a conditional @internal_api call, sometimes we will have to implement conditional logic mostly for optimisations (if Pydantic -> make a remote call, if not -> jus t do what you did before). Generally speaking - what we need to do now is to make sure that whoever was the user of those ORM clasess will continue using them the same way as they did in the past - without even knowing they are executing an internal/DB call under-the-hood. With as minimum duplication (mainly around implementing the method and calling module method) as possible. This is - unfortunately - a side effect of the fact that our ORM objects have been previously exposed to the developers of the DAGs via Context in operators/tasks. Those developers are entitled to use them and even call some of the methods of those objects via JINJA templates and taskflow parameters - and those calls should remain backwards compatible. TaskInstance is the most prominent example of this. The first step is to make sure that we do that with all the "internal" calls, and the second step will be to get all the remaining methods of those ORM objects (if there will be any left) and make them pass-through internal-api call in the same way (my assumption is that once we do it for all our code including tests, there will not be any left). The whole thing about AIP-44 implementation is that we are promising the users that they will not have change their DAGs IF they are not using any DB direct access but just use what we have in https://airflow.apache.org/docs/apache-airflow/stable/public-airflow-interface.html . This was one of the reasons why I explicitly wanted to list what is public interface - in a way that was a preparation of what to tell users when they swich to AIP-44. Context - historically - has been a public interface, and (historically again) it contained some ORM objects. This is also why that was one of the first things I've done when added Pydantic variants: If you look at https://github.com/apache/airflow/blob/main/airflow/utils/context.pyi - you will see the typing and all the ORM classes exposed via this typing have now `| Pydantic`. This is also information for the users that those attributes and methods that are available in Pydantic are "safe" to use by them in their operators. And now our job is to add those attributes and methods, that are "safe" and also that will provide maximum backwards compatibility for the current users. I hope this will make things cleaner, and if not - I am happy to have a call also with @mhenc to discuss it and explain and get more feedback :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
