Hey Giorgio:

We use JSONSchema for validation in our JSONSerializer when we need
it. We can do the same in this case. But we can also choose not to do
it - based on actual implementation and testing how much it costs.
This is typical practice when you have full control of both sides and
you can run a comprehensive test suite (which we will).

1) The method inventory is in the AIP docs -
https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API.
It might be slightly outdated as Airflow is constantly being developed
and we deliberately have not put details in the docs - but you can
find all the methods in the code and check what parameters there are
to pass/return. Those are basically parameters of each method that we
are "remoting" and returning values.
2) Those are already provided. As you can see from the initial
communication Both POCs - mine
https://github.com/apache/airflow/pull/25094 and Mateusz''s (you can
find it at the beginning of the thread) contains the code that we used
for testing. If you want to experiment with those - feel free.

> P.S when we'll feel the need of speed, PyO3 + Rust is the way to go or also 
> without going native, asyncio+uvloop.

Absolutely. If you need the best speed, those would be my favourites
too. PyO3 + Rust is precisely what Pydantic v2 uses (see this plan
that Samuel came up with
https://pydantic-docs.helpmanual.io/blog/pydantic-v2/). Unfortunately
Pydantic v2 is still months away (likely more than few). And maybe one
day we switch when we will not have to fight with its teething
problems.
We are in a little different situation than most "public" APIs out
there. All those methods of ours that we are going to remote will make
a (usually remote) Relational Database Query, converting Python
objects ORM to SQL - usually pretty heavy query at that. Executing
them in the DB and going back. Our tests confirm that and since then
optimising that part is non-goal for us (or rather has much lower
priority than familiarity with other parts of the codebase).

We do not want to micro-optimise the part of the process that can give
us low, single-percentage improvement. And we can always do it in the
future if we get to the point that this is our bottleneck - and it
will be easy to switch if we decide to.

J.

On Tue, Nov 8, 2022 at 6:13 PM Giorgio Zoppi <[email protected]> wrote:
>
> Makes sense.
> It's ok exchanging a json, but it's also important to provide a schema for 
> input validation in those cases.
> Yes, you'll have to maintain the schema, but safer is better than sorry. Two 
> questions:
>  1. which is the model that you want to serialize?  I don't see a clear 
> speration of concern between rpc rest call.
> 2. And also can you provide the tests for minimal experimentation?
> Best Regards,
> Giorgio
> P.S when we'll feel the need of speed, PyO3 + Rust is the way to go or also 
> without going native, asyncio+uvloop.
> in my recent REST service tests, the latency became comparable to Go.
>

Reply via email to