Just a reminder - meeting in ~ 50 minutes :) On Wed, Feb 16, 2022 at 2:34 PM Jarek Potiuk <[email protected]> wrote:
> Happy to hear if others have some experiences with in-process (and what I > really want is to make some benchmarking to see how much overhead each > option involves. I'd say that the "coarseness" of the calls (with maybe > exception of Connection/variable retrieval etc. will make the > serialization/deserialization will have very little impact on performance > (but without actually checking it it's hard to say for sure). Another > option is if inter-process communication will turn into a problem (and I > saw people doing it in C++) - people did "rip" some parts of thrift to only > leave a "serialization/deserialization". But in our case - if we find that > either the need to have separate process or communication involves a lot of > overhead we could come back to the idea of delegating the calls via > decorators. > > On Wed, Feb 16, 2022 at 2:22 PM Jarek Potiuk <[email protected]> wrote: > >> I looked at that too - and let me leave that as an option to explore in >> the first step. I will make a note. >> >> From what I checked - none of the current "ready-to-use" gRPC solutions >> have such an "in-process" option. I believe the "RPC framework re-use" for >> serialization/deserialization/transport might save a LOT of headache. >> >> However - Apache Thrift supports "shared-memory" transport. I still think >> it requires a separate process (To be confirmed). >> The gRPC one supports local TCP and Unix Sockets only. The in-memory >> option is not there (though people asked for it >> https://github.com/grpc/grpc/issues/19959) >> >> J. >> >> >> On Wed, Feb 16, 2022 at 2:13 PM Ash Berlin-Taylor <[email protected]> wrote: >> >>> That wasn't actually quite what I had in mind :) >>> >>> I was thinking that we _wouldn't_ go cross process at all, but in the >>> "local"/direct mode we will as-directly-as-possible call the handler code. >>> So for local/no-isolation we would still use the handler for the RPC, but >>> there it's just not "remote". >>> >>> -ash >>> >>> On Wed, Feb 16 2022 at 13:01:11 +0100, Jarek Potiuk <[email protected]> >>> wrote: >>> >>> Hey Everyone, >>> >>> Based on the feedback, I updated DAG-44 >>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API >>> - the "implementation notes" with improved approach. >>> >>> Ash had a good suggestion (which I really like) that instead of >>> inventing our own decorators and different way of handling the internal and >>> external communication for the "coarse" functions that require the >>> database, we could approach it differently - namely we could always use >>> RPC - no matter if we are in DB isolation mode or "no isolation" mode. Of >>> course in case of the "no isolation" mode, the communication should have >>> very low overhead (local TCP or Sockets, no authorization). I looked at >>> existing RPC implementations we could use for that and I narrowed down >>> potential choice of technologies to gRPC and Apache Thrift for that. >>> >>> This approach has multiple advantages: >>> >>> * we can leverage existing RPC implementations (Thrift and gRPC are both >>> mature and have integration with HTTPS, various authentication options and >>> can be also run using local sockets) >>> * the code will be much simpler to maintain - we will use existing >>> serialization mechanisms from those protocols >>> * no custom code for communication needed - both Thrift and gRPC have >>> all that is needed for scalable, robust communication >>> >>> I think this way we will be able to implement a more robust and >>> maintainable solution much faster. >>> >>> I also reached out to Apache Beam (they have support for both gRPC and >>> Thrift and are in the process of transitioning - from Thrift to gRPC as >>> primary protocol and I am sure they have done a lot of analysis that can >>> help us to make the final decision. >>> >>> This approach changes only the implementation details of the AIP-44 - >>> all the rest is the same, the approach, deployment options remain untouched >>> by this change. >>> >>> If you have any comments to that - feel free/ I will also discuss it >>> today at the meeting and if there will be general consensus that the >>> direction is right I would love to start voting on AIP-44 ideally tomorrow >>> - so that next week we can start implementing it. I am not sure if we want >>> to make a final decision about gRPC/Thrift (maybe there are people who have >>> good experience both and can share it here?). >>> >>> I think more detailed POC and benchmarking might be the first step of >>> the AiP - where we make the final choice based on an attempt to implement >>> POC for both - but I am also happy to listen to those who have more >>> experience with both (and maybe Beam experience will help with that).. >>> >>> J. >>> >>> >>> >>> >>> On Tue, Feb 15, 2022 at 1:49 PM Jarek Potiuk <[email protected]> wrote: >>> >>>> The meeting is tomorrow :)/ Feel free to join I will also record it >>>> and publish minutes! >>>> >>>> On Tue, Feb 15, 2022 at 12:31 PM Giorgio Zoppi <[email protected]> >>>> wrote: >>>> > >>>> > Hello Everyone, >>>> > is there any follow up of this meeting? I would like to participate >>>> if it's possible. >>>> > Best Regards, >>>> > Giorgio >>>> > >>>> > Il giorno mar 1 feb 2022 alle ore 15:29 Jarek Potiuk < >>>> [email protected]> ha scritto: >>>> >> >>>> >> Hello Everyone, >>>> >> >>>> >> I think it's about the time for the next sig-multitenancy meeting : >>>> >> >>>> >> I created a doodle poll for next week - please mark your >>>> availability till Friday the 4th. >>>> >> >>>> >> >>>> https://doodle.com/poll/axvu2gz7zhv8ieye?utm_source=poll&utm_medium=link >>>> >> >>>> >> I think what the rough agenda will be: >>>> >> >>>> >> * AIP-43 Dag Processor Separation [1] - implementation progress - >>>> Mateusz >>>> >> * AIP-44 Airflow Internal API [2] - voting progress (hopefully) - >>>> Jarek >>>> >> * AIP-45 Remove double DAG parsing [3] - discussion - Ping >>>> >> * AIP-46 Docker runtime isolation [4] - discussion - Ping >>>> >> * Also there are some ideas (not yet in AIP form) around optimizing >>>> DagProcessorLoop that might be good to talk about - also Ping. >>>> >> >>>> >> If there are any more proposals - feel free to ping me. >>>> >> I also encourage everyone to comment the AIP-45/46 proposals from >>>> Ping before the meeting. >>>> >> >>>> >> [1] >>>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-43+DAG+Processor+separation >>>> >> [2] >>>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API >>>> >> [3] >>>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-45+Remove+double+dag+parsing+in+airflow+run >>>> >> [4] >>>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-46+Add+support+for+docker+runtime+isolation+for+airflow+tasks+and+dag+parsing >>>> >> >>>> >> J. >>>> >> >>>> >> >>>> > >>>> > >>>> > -- >>>> > Life is a chess game - Anonymous. >>>> >>>
