[GitHub] [airflow] bolkedebruin commented on pull request #31565: Add pydantic support to serde

via GitHub Tue, 30 May 2023 01:50:34 -0700


bolkedebruin commented on PR #31565:
URL: https://github.com/apache/airflow/pull/31565#issuecomment-1568028626


   @potiuk thanks for the explanation, that wasn't clear from the 
implementation. Maybe it is time for a "For Developers" section / persona in 
the Airflow documentation.
   
   Couple of notes/questions. It might require some discussion (mail/talk) to 
reach common understanding:
   
   1. On the implementation side I have a hard time following what you are 
describing. The pydantic models in `serialization` are straight copies from the 
ORM counterparts. Are you referring to future implementations?
   2. I don't think `pydantic-sqlalchemy` as is was suitable for Airflow, but I 
think it could be extended or re-used so that it would work for us. For 
example, the `relationship` challenge could probably easily be solved by using 
a proxy and can be detected automatically. 
   3. On your #4 to me this seems a quite common thing that happens in RPC 
context. The `context.ti.get_num_running_task_instances()` just calls the Rest 
API instead of the ORM API to obtain what is required. Maybe I am looking at 
this with a too simple view?
   4. I would have loved that we took the chance when moving to a Managed API 
that we would drop certain things that really shouldn't be available. So slowly 
moving away from the ORM model to the API model. 
   5. In #29776 I would have preferred not using the `json` method from 
pydantic, but to use the `dict` method. Which allows us freedom of choice down 
the line and use a different encoder (JSON or whatever) in the future. 
Basically, *any* serialization should go through `serde` in my opinion as it 
does the right thing (TM ;-), like versioning and including class information) 
and if it doesn't there is a centralized place where we can fix it. The choice 
for pydantic in #29776 was due to `@dataclass` or `@attr` not serializing 
classes en related entities. But that is exactly what `serde` does for you. The 
only advantage pydantic currently has, is that it does this recursively and it 
keeps the structure which `@attr` and `@dataclass` don't do (that is a 
limitation of those classes not of `serde`). However, the current 
implementation of the ORMless models does not require that.
   6. Your mypy reference makes sense. But can't this be solved as well with 
inheritance? E.g. have a ORM implementation and API implementation that both 
inherit from an common ancestor?
   7. In case of code duplication, why not use generation at compile time 
instead of runtime if the above cannot be done? From the ORM models we can 
define what is required and generate the pydantic models at compile time. BTW: 
the way pydantic models seem to be used now are closely related to what 
protobuf does imho. 
   
   What it boils down to is that I do think that your 3 and 4 can be solved by 
using (generated) proxies that call a Rest API. I would be surprised if you 
didnt look at that yet so what am I missing?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [airflow] bolkedebruin commented on pull request #31565: Add pydantic support to serde

Reply via email to