[ https://issues.apache.org/jira/browse/FLINK-37844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
dalongliu updated FLINK-37844: ------------------------------ Parent: (was: FLINK-37481) Issue Type: Improvement (was: Sub-task) > FLIP-516 Optimization: Push down projections for StreamingMultiJoinOperator > --------------------------------------------------------------------------- > > Key: FLINK-37844 > URL: https://issues.apache.org/jira/browse/FLINK-37844 > Project: Flink > Issue Type: Improvement > Reporter: Gustavo de Morais > Priority: Major > > We're currently adding support for a StreamingMultiJoinOperator which is able > to join N inputs. There are multiple minor optimizations we might be able to > do that weren't so easy to do with multiple chained binary joins. One of them > is materializing into state only attributes that are either joined in any of > the N - 1 join conditions or are projected in the final output. We'd have to > do the following: > > * We already have the information of used fields for each input in > joinAttributeMap and can either pass that to the operator or add a new method > to the join extractor. > * The MultiJoin will contain the list of fields to be projected. We might > have to adapt and expose that as a map per inputid when creating the > FlinkMultiJoin. > * When adding a record to state, we remove attributes that will not be used > in join conditions or projected. > * If we use null for these attributes, we don't have to adapt the logic. If > we recreate rows with a smaller arity, multiple places have to be adjusted so > that all our index-based logic is updated and correct. > > Obs: this was a even more significant problem for binary joins, since we > materialized all attributes for all intermediate results. However, it's also > relevant here. I plan to measure impacts for each of the optimizations before > adding them [based on a > benchmark|https://github.com/apache/flink-benchmarks?tab=readme-ov-file#general-remarks], > and we'll first merge the operator. However, I'll be documenting the > optimizations with tickets so we track them here. This ticket arose from a > discussion with [~roman] > [here.|https://github.com/apache/flink/pull/26313#discussion_r2105917437] -- This message was sent by Atlassian Jira (v8.20.10#820010)