gustavodemorais commented on PR #27508:
URL: https://github.com/apache/flink/pull/27508#issuecomment-4031911886
> **As a result, may to create 3 Jira tickets :**
>
> 1. Generic/Binary row data problem in MultiJoinStateViews
> 2. Transform binary MultiJoin (with 2 inputs) bact to regular Join
> 3. Predicate pushdown rules for MultiJoin case
1. Sounds good, we can look into it in the next PR
2. _I think_ the performance difference shouldn't be too considerable -
we'll have the additional unnecessary joinKey layer in state. It's not
necessary because since the commonKey = joinKey we'll have a state with only
one key. That said, that may not cause any noticeable difference for most
cases. In my experience testing it with rocksdb, the biggest chunk of time is
used by the rocksdb API and not by the key serialization. Having a rule to
convert the nodes back feels a bit weird so the performance gain would have to
be considerable for us to justify doing that. There might be other ways of
dealing with that. Also, the MultiJoin can be used with hints MULTI_JOIN(t1,
t2), so if a user wants we can configure that already. I think what you
mentioned on 3 is more interesting.
3. This is a valid problem. As you mentioned, I think there are two valid
strategies we could look into both with some pros and cons. In general, having
the MultiJoin rule later makes so that we reuse the rules written for regular
joins. If we have it earlier, as you mentioned we might have to reimplement the
rules. Also, if we could implement an optimization which is valid for both
regular joins and multijoins, that's even better. In the example you mentioned,
if we could push the calc after the regular join so that it's also not there
when they turn into multijoin, it's better since we're optimizing more use
cases. That said, as we did in this ticket, these require research and some
technical discussion. So go ahead with the ticket and any discussion can
continue there :)
I'd first create a ticket for 1 and 2. For **this PR**, let's keep scope
focused on the original fix only. Thanks, @ldadima.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]