bolkedebruin commented on PR #31565: URL: https://github.com/apache/airflow/pull/31565#issuecomment-1568028626
@potiuk thanks for the explanation, that wasn't clear from the implementation. Maybe it is time for a "For Developers" section / persona in the Airflow documentation. Couple of notes/questions. It might require some discussion (mail/talk) to reach common understanding: 1. On the implementation side I have a hard time following what you are describing. The pydantic models in `serialization` are straight copies from the ORM counterparts. Are you referring to future implementations? 2. I don't think `pydantic-sqlalchemy` as is was suitable for Airflow, but I think it could be extended or re-used so that it would work for us. For example, the `relationship` challenge could probably easily be solved by using a proxy and can be detected automatically. 3. On your #4 to me this seems a quite common thing that happens in RPC context. The `context.ti.get_num_running_task_instances()` just calls the Rest API instead of the ORM API to obtain what is required. Maybe I am looking at this with a too simple view? 4. I would have loved that we took the chance when moving to a Managed API that we would drop certain things that really shouldn't be available. So slowly moving away from the ORM model to the API model. 5. In #29776 I would have preferred not using the `json` method from pydantic, but to use the `dict` method. Which allows us freedom of choice down the line and use a different encoder (JSON or whatever) in the future. Basically, *any* serialization should go through `serde` in my opinion as it does the right thing (TM ;-), like versioning and including class information) and if it doesn't there is a centralized place where we can fix it. The choice for pydantic in #29776 was due to `@dataclass` or `@attr` not serializing classes en related entities. But that is exactly what `serde` does for you. The only advantage pydantic currently has, is that it does this recursively and it keeps the structure which `@attr` and `@dataclass` don't do (that is a limitation of those classes not of `serde`). However, the current implementation of the ORMless models does not require that. 6. Your mypy reference makes sense. But can't this be solved as well with inheritance? E.g. have a ORM implementation and API implementation that both inherit from an common ancestor? 7. In case of code duplication, why not use generation at compile time instead of runtime if the above cannot be done? From the ORM models we can define what is required and generate the pydantic models at compile time. BTW: the way pydantic models seem to be used now are closely related to what protobuf does imho. What it boils down to is that I do think that your 3 and 4 can be solved by using (generated) proxies that call a Rest API. I would be surprised if you didnt look at that yet so what am I missing? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
