Hi Ron, Happy to have your support for the FLIP. Yes, the new option will be only effective for streaming. Batch will continue to work as it currently does. In general, a batch implementation could use completely different and more efficient algorithms for its use case to perform a multi join, like the leapfrog triejoin.
Best, Gustavo Am Do., 8. Mai 2025 um 05:07 Uhr schrieb Ron Liu <ron9....@gmail.com>: > Hi, Gustavo > > Sorry for the late participation in the FLIP discussion, this is a great > feature to solve the headache of the stream join, Big +1. > > Regarding the new config option `table.optimizer.multi-join.enabled`, I > have a question: is this option only effective in streaming mode, what is > its behavior in batch mode? > > Best, > Ron > > Gustavo de Morais <gustavopg...@gmail.com> 于2025年5月8日周四 00:03写道: > > > Hey Ferenc, that's a great callout. I'll make sure we add some > > documentation regarding general advice on when to use multi-way joins > (pros > > and cons). > > > > Am Di., 6. Mai 2025 um 17:23 Uhr schrieb Ferenc Csaky > > <ferenc.cs...@pm.me.invalid>: > > > > > Hi, > > > > > > I think the FLIP is in a fairly good state, +1 for the idea and the > given > > > design. This may be considered already, but IMO we should also add some > > > high-level details, pros, and cons of enabling this feature to the > > website > > > other than the config option description. > > > > > > Best, > > > Ferenc > > > > > > > > > > > > > > > On Friday, May 2nd, 2025 at 14:47, Gustavo de Morais < > > > gustavopg...@gmail.com> wrote: > > > > > > > > > > > > > > > Hey everyone, > > > > > > > > I'd be great to start voting next week. Let me know if there are > > further > > > > questions or feedback. > > > > > > > > Thanks, > > > > Gustavo > > > > > > > > Am Mi., 30. Apr. 2025 um 15:07 Uhr schrieb Gustavo de Morais < > > > > gustavopg...@gmail.com>: > > > > > > > > > Hey Arvid and David, thanks for the feedback! > > > > > > > > > > The limitations are in the flip, I just had pasted a wrong link and > > > fixed > > > > > it. Let me know if there are other incorrect links. > > > > > > > > > > Yes, the thought of using statistics has potential. I've also spent > > > some > > > > > on that. The precise statistics required here would however be the > > > amount > > > > > of intermediate state/matches for each level and this is an > > > information we > > > > > only have at runtime/inside the operator. For that, we could look > > into > > > an > > > > > adaptive multi-way join in a next interaction and the user could > > > determine > > > > > a max amount of state he's willing to store. This has potential but > > > would > > > > > be a topic for a next FLIP, I added some information on that under > > the > > > > > rejected alternatives. > > > > > > > > > > Kind regards, > > > > > Gustavo > > > > > > > > > > Am Mo., 28. Apr. 2025 um 14:18 Uhr schrieb David Radley < > > > > > david_rad...@uk.ibm.com>: > > > > > > > > > > > Hi Gustavo,This sounds like a great idea. > > > > > > I notice the link limitations< > > > > > > > > > > > > https://confluentinc.atlassian.net/wiki/spaces/FLINK/pages/4342875697/FLIP-516+Multi-Way+Join+Operator#Limitations > > > > > > > > > > in the Flip points outside of the document to something I do not > > have > > > > > > access to. Please could you include the limitations in the flip > > > itself. > > > > > > > > > > > > You mention re ordered binary joins might be less efficient by > > > turning > > > > > > them into a multi join. I wonder what the pros and cons are. I > > > wonder can > > > > > > we use statistics to decide whether we should do a multi way > join? > > > In this > > > > > > case we could have an enum configuration something like: > > > > > > table.optimizer.join= binary-join, multi-join, auto. > > > > > > > > > > > > Kind regards, David. > > > > > > > > > > > > From: Arvid Heise ar...@apache.org > > > > > > Date: Monday, 28 April 2025 at 12:47 > > > > > > To: dev@flink.apache.org dev@flink.apache.org > > > > > > Subject: [EXTERNAL] Re: [DISCUSS] FLIP-516: Multi-Way Join > Operator > > > > > > Hi Gustavo, > > > > > > > > > > > > the idea and approach LGTM. +1 to proceed. > > > > > > > > > > > > Best, > > > > > > > > > > > > Arvid > > > > > > > > > > > > On Thu, Apr 24, 2025 at 4:58 PM Gustavo de Morais < > > > gustavopg...@gmail.com > > > > > > > > > > > > wrote: > > > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > > > I'd like to propose FLIP-516: Multi-Way Join Operator [1] for > > > > > > > discussion. > > > > > > > > > > > > > > Chained non-temporal joins in Flink SQL often cause a "big > state > > > issue" > > > > > > > due > > > > > > > to large intermediate results, impacting performance and > > > stability. This > > > > > > > FLIP introduces a StreamingMultiJoinOperator to tackle this by > > > joining > > > > > > > multiple inputs (that need to share a common key) > simultaneously > > > within > > > > > > > one > > > > > > > operator. > > > > > > > > > > > > > > The main goal is achieving zero intermediate state for these > > > common join > > > > > > > patterns, significantly reducing state size. This initial > version > > > > > > > requires > > > > > > > a common partitioning key and focuses on INNER/LEFT joins, with > > > plans > > > > > > > for > > > > > > > future expansion. The operator is opt-in via > > > > > > > table.optimizer.multi-join.enabled (default false). PR with the > > > initial > > > > > > > version of the operator is available [2]. > > > > > > > > > > > > > > Happy to be contributing to this community, and looking forward > > to > > > your > > > > > > > feedback and thoughts. > > > > > > > > > > > > > > Kind regards, > > > > > > > Gustavo de Morais > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-516%3A+Multi-Way+Join+Operator > > > > > > > > > > > > > [2] https://github.com/apache/flink/pull/26313 > > > > > > > > > > > > Unless otherwise stated above: > > > > > > > > > > > > IBM United Kingdom Limited > > > > > > Registered in England and Wales with number 741598 > > > > > > Registered office: Building C, IBM Hursley Office, Hursley Park > > Road, > > > > > > Winchester, Hampshire SO21 2JN > > > > > >