Re: [VOTE] AIP-44 - Airflow Internal API

Jarek Potiuk Wed, 10 Aug 2022 12:36:20 -0700

1st of all - I wish we had this discussion before :)

> The autogenerated code also is available from OpenAPI specs, of course,
and the request/response semantics in the same thread are precisely what
make load balancing these calls a little harder, and horizontal scaling to
multiple threads comes with every HTTP server, but I digress - clearly
you've made a choice here to write an RPC layer rather than a lightweight
API layer, and go with the more RPC-style features, and I get that.


OpenAPI is REST not RPC and it does not map well to RPC style calls. I
tried to explain in detail in the AIP (and in earlier discussions). And the
choice is basically made for us because of our expectation to have
minimal impact on the existing code (and ability to switch off the remote
part). We really do not want to introduce new API calls.  We want to make
sure that existing "DB transactions" (i.e. coarse grained calls) are
remotely executed. So we are not really talking about lightweight API
almost by definition. Indeed, Open API also maps a definition described in
a declarative way to python code.  but it has this non-nice part that
OpenAPI/REST, it is supposed to be run on some resources. We have no
resources to run it on - every single method we call is doing something.
Even from the REST/OpenAPI semantics I'd have a really hard time to decide
whether it should be GET, POST or PUT or DELETE. In most cases this will be
a combination of those 4 on several different resources. All the "nice
parts" of Open API (Swagger UI etc.) become next to useless if you try to
map such remote procedures we have, to REST calls.


> I still think it's choosing something that's more complex to maintain
over a simpler, more accessible option, but since I won't be the one
writing it, that's not really for me to worry about. I am very curious to
see how this evolves as the work towards multitenancy really kicks in and
all the API schemas need to change to add namespacing!

The gRPC (proto) is designed from ground-up with maintainability in mind.
The ability to evolve the API, add new parameters etc. is built-in the
protobuf definition. From the user perspective it's actually easier to use
than OpenAPI when it comes to remote method calls, because you basically -
call a method.

And also coming back to monitoring - literally every monitoring platform
supports gRPC to monitor. Grafana, NewRelic, CloudWatch, Datadog, you name
it. In our case.

Also load-balancing:

Regardless of the choice we talk about HTTP request/response happening for
1 call. This is the boundary. Each of the calls we have will be a separate
transaction, separate call, not connected to any other call. The server
handling it will be stateless (state will be stored in a DB when each call
completes). I deliberately put the "boundary" of each of the remotely
callable methods, to be a complete DB transaction to achieve it.

So It really does not change whether we use gRPC or REST/JSON. REST/JSON
vs. gRPC is just the content of the message, but this is the very same HTTP
call, with same authentication added on top, same headers - just how the
message is encoded is different. The same tools for load balancing works in
the same way regardless if we use gRPC or REST/JSON. This is really a
higher layer than the one involved in load balancing.





On Wed, Aug 10, 2022 at 9:10 PM Andrew Godwin
<andrew.god...@astronomer.io.invalid> wrote:

> Well, binary representation serialization performance is worse for
> protobuf than for JSON in my experience (in Python specifically) - unless
> you mean size-on-the-wire, which I can agree with but tend to not worry
> about very much since it's rarely a bottleneck.
>
> The autogenerated code also is available from OpenAPI specs, of course,
> and the request/response semantics in the same thread are precisely what
> make load balancing these calls a little harder, and horizontal scaling to
> multiple threads comes with every HTTP server, but I digress - clearly
> you've made a choice here to write an RPC layer rather than a lightweight
> API layer, and go with the more RPC-style features, and I get that.
>
> I still think it's choosing something that's more complex to maintain over
> a simpler, more accessible option, but since I won't be the one writing it,
> that's not really for me to worry about. I am very curious to see how this
> evolves as the work towards multitenancy really kicks in and all the API
> schemas need to change to add namespacing!
>
> Andrew
>
> On Wed, Aug 10, 2022 at 12:52 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>
>> Sure I can explain - the main reasons are:
>>
>> - 1) binary representation performance - impact of this is rather limited
>> because our API calls are doing rather heavy processing compared to the
>> data being transmitted. But I believe it's not negligible.
>> - 2) automated tools to automatically generate strongly typed Python code
>> (that's the ProtoBuf part). The strongly typed Python code is what
>> convinced me (see my POC). The tooling we got for that is excellent. Far
>> more superior than dealing with json-encoded data even with schema.
>> - 2) built-in "remote procedure" layer - where we have request/response
>> semantics optimisations (for multiple calls over the same chanel) and
>> exception handling done for us (This is basically what "Remote Procedure"
>> interface provide us)
>> - 3) built-in server that efficiently distributes the method called from
>> multiple client into a multi-threaded/multi-threaded execution (all
>> individual calls are stateless so multi-processing can be added on top
>> regardless from the "transport" chosen).
>>
>> BTW. If you look at my POC code, there is nothing that strongly "binds"
>> us to gRPC. The nice thing is that once it is implemented, it can be
>> swapped out very easily. The only Proto/gRPC code that "leaks" to "generic"
>> Airflow code is mapping of some (most complex) parameters to Proto . And
>> this is only for most complex cases - literally only few of our types
>> require custom serialisation - most of the mapping is handled automatically
>> by generated protobuf code. And we can easily put the "complex" mapping in
>> a separate package. Plus there is an "if" statement for each of the ~ 30 or
>> so methods that we will have to turn into remotely-callable. We can even
>> (as I proposed it as an option) add a little python magic and add a simple
>> decorator to handle the "ifs". Then the decorator "if" can be swapped with
>> some other "remote call" implementation.
>>
>> The actual bulk of the implementation is to make sure that all the places
>> are covered (that's the testing harness).
>>
>>
>>
>> On Wed, Aug 10, 2022 at 8:25 PM Andrew Godwin
>> <andrew.god...@astronomer.io.invalid> wrote:
>>
>>> Hi Jarek,
>>>
>>> Apologies - I was as not involved as I wanted to be in the AIP-44
>>> process, and obviously my vote is non-binding anyway - but having done a
>>> lot of Python API development over the past decade or so I wanted to know
>>> why the decision was made to go with gRPC over just plain HTTP+JSON (with a
>>> schema, of course).
>>>
>>> The AIP covers why XMLRPC and Thrift lost out to gRPC, which I agree
>>> with - but does not go into the option of using a standard Python HTTP
>>> server with JSON schema enforcement, such as FastAPI. In my experience, the
>>> tools for load balancing, debugging, testing and monitoring JSON/HTTP are
>>> superior and more numerous than those for gRPC, and in addition the
>>> asynchronous support for gRPC servers is lacking compared to their plain
>>> HTTP counterparts, and the fact that you can interact and play with the
>>> APIs in prototyping stages without having to handle obtaining correct
>>> protobuf versions for the Airflow version you're using.
>>>
>>> I wouldn't go so far as to suggest a veto, but I do want to see the AIP
>>> address why gRPC would win over this option. Apologies again for the late
>>> realisation that gRPC got chosen and was being voted on - it's been a very
>>> busy summer.
>>>
>>> Thanks,
>>> Andrew
>>>
>>> On Wed, Aug 10, 2022 at 12:12 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>>>
>>>> Just let me express my rather strong dissatisfaction with the way
>>>> this "last minute" is raised.
>>>>
>>>> It is very late to come up with such a statement - not that it comes at
>>>> all, but when it comes when everyone had a chance to take a look and
>>>> comment, including taking a look at the POC and result of checks. This has
>>>> never been raised even 4 months ago where the only choices were Thrift and
>>>> gRPc).
>>>>
>>>> I REALLY hope the arguments are very strong and backed by real examples
>>>> and data why it is a bad choice rather than opinions.
>>>>
>>>> J,.
>>>>
>>>> On Wed, Aug 10, 2022 at 7:50 PM Ash Berlin-Taylor <a...@apache.org>
>>>> wrote:
>>>>
>>>>> Sorry to weigh in at the last minute, but I'm wary of gRPC over just
>>>>> JSON, so -1 to that specific choice. Everything else I'm happy with.
>>>>>
>>>>> I (or Andrew G) will follow up with more details shortly.
>>>>>
>>>>> -ash
>>>>>
>>>>> On Wed, Aug 10 2022 at 19:38:59 +02:00:00, Jarek Potiuk <
>>>>> ja...@potiuk.com> wrote:
>>>>>
>>>>> Oh yeah :)
>>>>>
>>>>> On Wed, Aug 10, 2022 at 7:23 PM Ping Zhang <pin...@umich.edu> wrote:
>>>>>
>>>>>> ah, good call.
>>>>>>
>>>>>> I guess the email template can be updated:
>>>>>>
>>>>>> Only votes from PMC members are binding, but members of the community
>>>>>>> are encouraged to check the AIP and vote with "(non-binding)".
>>>>>>
>>>>>>
>>>>>>
>>>>>> +1 (binding)
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Ping
>>>>>>
>>>>>>
>>>>>> On Wed, Aug 10, 2022 at 10:20 AM Jarek Potiuk <ja...@potiuk.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Thank you . And BTW. It's binding Ping :). For AIP's commiter's
>>>>>>> votes are binding. See
>>>>>>> https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#commit-policy
>>>>>>> :D
>>>>>>>
>>>>>>> On Wed, Aug 10, 2022 at 7:16 PM Ping Zhang <pin...@umich.edu> wrote:
>>>>>>>
>>>>>>>> +1 (non-binding)
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Ping
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Aug 4, 2022 at 1:42 AM Jarek Potiuk <ja...@potiuk.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hey everyone,
>>>>>>>>>
>>>>>>>>> I would like to cast a vote for "AIP-44 - Airflow Internal API".
>>>>>>>>>
>>>>>>>>> The AIP-44 is here:
>>>>>>>>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API
>>>>>>>>>
>>>>>>>>> Discussion thread:
>>>>>>>>> https://lists.apache.org/thread/nsmo339m618kjzsdkwq83z8omrt08zh3
>>>>>>>>>
>>>>>>>>> The voting will last for 5 days (until 9th of August 2022 11:00
>>>>>>>>> CEST), and until at least 3 binding votes have been cast.
>>>>>>>>>
>>>>>>>>> Please vote accordingly:
>>>>>>>>>
>>>>>>>>> [ ] + 1 approve
>>>>>>>>> [ ] + 0 no opinion
>>>>>>>>> [ ] - 1 disapprove with the reason
>>>>>>>>>
>>>>>>>>> Only votes from PMC members are binding, but members of the
>>>>>>>>> community are encouraged to check the AIP and vote with
>>>>>>>>> "(non-binding)".
>>>>>>>>>
>>>>>>>>> ----
>>>>>>>>>
>>>>>>>>> Just a summary of where we are:
>>>>>>>>>
>>>>>>>>> It's been long in the making, but I think it might be a great
>>>>>>>>> step-forward to our long-term multi-tenancy goal. I believe the 
>>>>>>>>> proposal I
>>>>>>>>> have is quite well thought out and discussed a lot in the past:
>>>>>>>>>
>>>>>>>>> * we have a working POC for implementation used for performance
>>>>>>>>> testing:  https://github.com/apache/airflow/pull/25094
>>>>>>>>> * it is based on on industry-standard open-source gRPC (which is
>>>>>>>>> already our dependency) which fits better the RPC "model" we need 
>>>>>>>>> than our
>>>>>>>>> public REST API.
>>>>>>>>> * it has moderate performance impact and rather good
>>>>>>>>> maintainability features (very little impact on regular development 
>>>>>>>>> effort)
>>>>>>>>> * it is fully backwards compatible - the new DB isolation will be
>>>>>>>>> an optional feature
>>>>>>>>> * has a solid plan for full test coverage in our CI
>>>>>>>>> * has a backing and plans for more extensive complete testing in
>>>>>>>>> "real" environment with Google Composer team support
>>>>>>>>> * allows for further extensions as part of AIP-1 (I am planning to
>>>>>>>>> re-establish sig-multitenancy effort for follow up AIPs once this one 
>>>>>>>>> is
>>>>>>>>> well in progress).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> J.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>

Re: [VOTE] AIP-44 - Airflow Internal API

Reply via email to