potiuk commented on PR #27468:
URL: https://github.com/apache/airflow/pull/27468#issuecomment-1307348759

   Yep. AIP is even approved - conditionally re- on-the-wire transport 
mechanism (i.e remote method execution choice) and serialization choice we 
have. 
   
   And while I iniitially advocated for GRPC (which answer both questions), 
there are some drawbacks of using GRPC (debuggability and threading model used) 
- this has been pointed by @ashb  and @andrewgodwin  mostly. 
   
   There are some benefitst of using more "web standard" approach - JSON-RPC 
allows us to plug-in better into already existing mechanisms we use and deos 
not require from us to learn and operate any new technology. The tests 
performed (initally by me then later by Mateusz) have shown that the 
performance overhead from using binary RPC vs. JSON are not really meaningful 
in our context, so they are not decisive factor.
   
   I think the last thing that we need to see if whether we can replace 
Pickling with our own JSON serialization we alredy use in airflow - it seems 
like bad idea  initially (because of the overhead) but again, it's likely 
negigible and the benefit of using more "standard" for us approach (we 
serialize internal objects using JSON Serializer a lot in various cases) and 
being able to debug it "on-the-wire" without extra tools are I think important 
benefits that should not be under-rated. I think here our own JSON Serializer 
trumps Pickle, because it already transparently handles a number of structures 
we will have to serialize including some K8S configuraitons (which require 
custom serialization when using pickle. 
   
   The "easiness of development" and maintainability were for me important but 
as @mhenc  mentioned - we did both POCs - GRPC here (I did it): 
https://github.com/apache/airflow/pull/25094 and this one is using JSON-RPC and 
for me, it looks like the approach proposed by Mateusz is even easier to 
develop and maintain - and as long as our serialization approch works 
transparently for all the object that we need to send/retrieve. So i got 
convinced that what Ash and Andrew proposed is a good idea. It does not change 
the principle of the AIP - it's just a matter of implementation detail. From 
the development perspective, this will be mostly decorating of the methods that 
we want to turn into "remote" ones.
   
   @bolkedebruin - I think if you read through the discussions - maybe you will 
have other specifc comments about those choices - if they have anything to add 
on top of the above POCs. Tests and conclusions we have based on those. We 
deliberately took it slower with Mateusz and gave both of us (and the 
community) time to comment and raise their concerns, the AIP general approach 
is already approved, but if there are any specific concerns on those technology 
choices - we are still open to hear a bit more comments before we make the 
first foundng "real PR" rather than POC - as we promised when we called for 
voting on the AIP.
   
   Do you have some specific concerns or other criteria that we should 
consider. Just to summarize those that we took into account:
   
   * easines of development/maintenance
   * debuggability
   * familiarlity of the community with underlying technologies
   * speed/performance of serialisation and transport
   
   They are now listed in order of preference - mostly because we found out 
that speed/performance of serialization and transport is literally orders of 
magnitude less important than speed/performance of the operations we are going 
to serialize and there is very little difference of the overhead.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to