Hey everyone, I'd be great to start voting next week. Let me know if there are further questions or feedback.
Thanks, Gustavo Am Mi., 30. Apr. 2025 um 15:07 Uhr schrieb Gustavo de Morais < gustavopg...@gmail.com>: > Hey Arvid and David, thanks for the feedback! > > The limitations are in the flip, I just had pasted a wrong link and fixed > it. Let me know if there are other incorrect links. > > Yes, the thought of using statistics has potential. I've also spent some > on that. The precise statistics required here would however be the amount > of intermediate state/matches for each level and this is an information we > only have at runtime/inside the operator. For that, we could look into an > adaptive multi-way join in a next interaction and the user could determine > a max amount of state he's willing to store. This has potential but would > be a topic for a next FLIP, I added some information on that under the > rejected alternatives. > > Kind regards, > Gustavo > > Am Mo., 28. Apr. 2025 um 14:18 Uhr schrieb David Radley < > david_rad...@uk.ibm.com>: > >> Hi Gustavo,This sounds like a great idea. >> I notice the link limitations< >> https://confluentinc.atlassian.net/wiki/spaces/FLINK/pages/4342875697/FLIP-516+Multi-Way+Join+Operator#Limitations> >> in the Flip points outside of the document to something I do not have >> access to. Please could you include the limitations in the flip itself. >> >> You mention re ordered binary joins might be less efficient by turning >> them into a multi join. I wonder what the pros and cons are. I wonder can >> we use statistics to decide whether we should do a multi way join? In this >> case we could have an enum configuration something like: >> table.optimizer.join= binary-join, multi-join, auto. >> >> Kind regards, David. >> >> >> From: Arvid Heise <ar...@apache.org> >> Date: Monday, 28 April 2025 at 12:47 >> To: dev@flink.apache.org <dev@flink.apache.org> >> Subject: [EXTERNAL] Re: [DISCUSS] FLIP-516: Multi-Way Join Operator >> Hi Gustavo, >> >> the idea and approach LGTM. +1 to proceed. >> >> Best, >> >> Arvid >> >> On Thu, Apr 24, 2025 at 4:58 PM Gustavo de Morais <gustavopg...@gmail.com >> > >> wrote: >> >> > Hi everyone, >> > >> > I'd like to propose FLIP-516: Multi-Way Join Operator [1] for >> discussion. >> > >> > Chained non-temporal joins in Flink SQL often cause a "big state issue" >> due >> > to large intermediate results, impacting performance and stability. This >> > FLIP introduces a StreamingMultiJoinOperator to tackle this by joining >> > multiple inputs (that need to share a common key) simultaneously within >> one >> > operator. >> > >> > The main goal is achieving zero intermediate state for these common join >> > patterns, significantly reducing state size. This initial version >> requires >> > a common partitioning key and focuses on INNER/LEFT joins, with plans >> for >> > future expansion. The operator is opt-in via >> > table.optimizer.multi-join.enabled (default false). PR with the initial >> > version of the operator is available [2]. >> > >> > Happy to be contributing to this community, and looking forward to your >> > feedback and thoughts. >> > >> > Kind regards, >> > Gustavo de Morais >> > >> > [1] >> > >> > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-516%3A+Multi-Way+Join+Operator >> > >> > [2] https://github.com/apache/flink/pull/26313 >> > >> >> Unless otherwise stated above: >> >> IBM United Kingdom Limited >> Registered in England and Wales with number 741598 >> Registered office: Building C, IBM Hursley Office, Hursley Park Road, >> Winchester, Hampshire SO21 2JN >> >