BTW. I re-read your comment in the AIP and yeah ... I think I completely misunderstood it :)
On Wed, Feb 16, 2022 at 6:08 PM Jarek Potiuk <[email protected]> wrote: > Just a reminder - meeting in ~ 50 minutes :) > > On Wed, Feb 16, 2022 at 2:34 PM Jarek Potiuk <[email protected]> wrote: > >> Happy to hear if others have some experiences with in-process (and what I >> really want is to make some benchmarking to see how much overhead each >> option involves. I'd say that the "coarseness" of the calls (with maybe >> exception of Connection/variable retrieval etc. will make the >> serialization/deserialization will have very little impact on performance >> (but without actually checking it it's hard to say for sure). Another >> option is if inter-process communication will turn into a problem (and I >> saw people doing it in C++) - people did "rip" some parts of thrift to only >> leave a "serialization/deserialization". But in our case - if we find that >> either the need to have separate process or communication involves a lot of >> overhead we could come back to the idea of delegating the calls via >> decorators. >> >> On Wed, Feb 16, 2022 at 2:22 PM Jarek Potiuk <[email protected]> wrote: >> >>> I looked at that too - and let me leave that as an option to explore in >>> the first step. I will make a note. >>> >>> From what I checked - none of the current "ready-to-use" gRPC solutions >>> have such an "in-process" option. I believe the "RPC framework re-use" for >>> serialization/deserialization/transport might save a LOT of headache. >>> >>> However - Apache Thrift supports "shared-memory" transport. I still >>> think it requires a separate process (To be confirmed). >>> The gRPC one supports local TCP and Unix Sockets only. The in-memory >>> option is not there (though people asked for it >>> https://github.com/grpc/grpc/issues/19959) >>> >>> J. >>> >>> >>> On Wed, Feb 16, 2022 at 2:13 PM Ash Berlin-Taylor <[email protected]> >>> wrote: >>> >>>> That wasn't actually quite what I had in mind :) >>>> >>>> I was thinking that we _wouldn't_ go cross process at all, but in the >>>> "local"/direct mode we will as-directly-as-possible call the handler code. >>>> So for local/no-isolation we would still use the handler for the RPC, but >>>> there it's just not "remote". >>>> >>>> -ash >>>> >>>> On Wed, Feb 16 2022 at 13:01:11 +0100, Jarek Potiuk <[email protected]> >>>> wrote: >>>> >>>> Hey Everyone, >>>> >>>> Based on the feedback, I updated DAG-44 >>>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API >>>> - the "implementation notes" with improved approach. >>>> >>>> Ash had a good suggestion (which I really like) that instead of >>>> inventing our own decorators and different way of handling the internal and >>>> external communication for the "coarse" functions that require the >>>> database, we could approach it differently - namely we could always use >>>> RPC - no matter if we are in DB isolation mode or "no isolation" mode. Of >>>> course in case of the "no isolation" mode, the communication should have >>>> very low overhead (local TCP or Sockets, no authorization). I looked at >>>> existing RPC implementations we could use for that and I narrowed down >>>> potential choice of technologies to gRPC and Apache Thrift for that. >>>> >>>> This approach has multiple advantages: >>>> >>>> * we can leverage existing RPC implementations (Thrift and gRPC are >>>> both mature and have integration with HTTPS, various authentication options >>>> and can be also run using local sockets) >>>> * the code will be much simpler to maintain - we will use existing >>>> serialization mechanisms from those protocols >>>> * no custom code for communication needed - both Thrift and gRPC have >>>> all that is needed for scalable, robust communication >>>> >>>> I think this way we will be able to implement a more robust and >>>> maintainable solution much faster. >>>> >>>> I also reached out to Apache Beam (they have support for both gRPC and >>>> Thrift and are in the process of transitioning - from Thrift to gRPC as >>>> primary protocol and I am sure they have done a lot of analysis that can >>>> help us to make the final decision. >>>> >>>> This approach changes only the implementation details of the AIP-44 - >>>> all the rest is the same, the approach, deployment options remain untouched >>>> by this change. >>>> >>>> If you have any comments to that - feel free/ I will also discuss it >>>> today at the meeting and if there will be general consensus that the >>>> direction is right I would love to start voting on AIP-44 ideally tomorrow >>>> - so that next week we can start implementing it. I am not sure if we want >>>> to make a final decision about gRPC/Thrift (maybe there are people who have >>>> good experience both and can share it here?). >>>> >>>> I think more detailed POC and benchmarking might be the first step of >>>> the AiP - where we make the final choice based on an attempt to implement >>>> POC for both - but I am also happy to listen to those who have more >>>> experience with both (and maybe Beam experience will help with that).. >>>> >>>> J. >>>> >>>> >>>> >>>> >>>> On Tue, Feb 15, 2022 at 1:49 PM Jarek Potiuk <[email protected]> wrote: >>>> >>>>> The meeting is tomorrow :)/ Feel free to join I will also record it >>>>> and publish minutes! >>>>> >>>>> On Tue, Feb 15, 2022 at 12:31 PM Giorgio Zoppi < >>>>> [email protected]> wrote: >>>>> > >>>>> > Hello Everyone, >>>>> > is there any follow up of this meeting? I would like to participate >>>>> if it's possible. >>>>> > Best Regards, >>>>> > Giorgio >>>>> > >>>>> > Il giorno mar 1 feb 2022 alle ore 15:29 Jarek Potiuk < >>>>> [email protected]> ha scritto: >>>>> >> >>>>> >> Hello Everyone, >>>>> >> >>>>> >> I think it's about the time for the next sig-multitenancy meeting : >>>>> >> >>>>> >> I created a doodle poll for next week - please mark your >>>>> availability till Friday the 4th. >>>>> >> >>>>> >> >>>>> https://doodle.com/poll/axvu2gz7zhv8ieye?utm_source=poll&utm_medium=link >>>>> >> >>>>> >> I think what the rough agenda will be: >>>>> >> >>>>> >> * AIP-43 Dag Processor Separation [1] - implementation progress - >>>>> Mateusz >>>>> >> * AIP-44 Airflow Internal API [2] - voting progress (hopefully) - >>>>> Jarek >>>>> >> * AIP-45 Remove double DAG parsing [3] - discussion - Ping >>>>> >> * AIP-46 Docker runtime isolation [4] - discussion - Ping >>>>> >> * Also there are some ideas (not yet in AIP form) around optimizing >>>>> DagProcessorLoop that might be good to talk about - also Ping. >>>>> >> >>>>> >> If there are any more proposals - feel free to ping me. >>>>> >> I also encourage everyone to comment the AIP-45/46 proposals from >>>>> Ping before the meeting. >>>>> >> >>>>> >> [1] >>>>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-43+DAG+Processor+separation >>>>> >> [2] >>>>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-44+Airflow+Internal+API >>>>> >> [3] >>>>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-45+Remove+double+dag+parsing+in+airflow+run >>>>> >> [4] >>>>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-46+Add+support+for+docker+runtime+isolation+for+airflow+tasks+and+dag+parsing >>>>> >> >>>>> >> J. >>>>> >> >>>>> >> >>>>> > >>>>> > >>>>> > -- >>>>> > Life is a chess game - Anonymous. >>>>> >>>>
