[ 
https://issues.apache.org/jira/browse/FLINK-37844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dalongliu updated FLINK-37844:
------------------------------
        Parent:     (was: FLINK-37481)
    Issue Type: Improvement  (was: Sub-task)

> FLIP-516 Optimization: Push down projections for StreamingMultiJoinOperator
> ---------------------------------------------------------------------------
>
>                 Key: FLINK-37844
>                 URL: https://issues.apache.org/jira/browse/FLINK-37844
>             Project: Flink
>          Issue Type: Improvement
>            Reporter: Gustavo de Morais
>            Priority: Major
>
> We're currently adding support for a StreamingMultiJoinOperator which is able 
> to join N inputs. There are multiple minor optimizations we might be able to 
> do that weren't so easy to do with multiple chained binary joins. One of them 
> is materializing into state only attributes that are either joined in any of 
> the N - 1 join conditions or are projected in the final output. We'd have to 
> do the following:
>  
>  * We already have the information of used fields for each input in 
> joinAttributeMap and can either pass that to the operator or add a new method 
> to the join extractor.
>  * The MultiJoin will contain the list of fields to be projected. We might 
> have to adapt and expose that as a map per inputid when creating the 
> FlinkMultiJoin.
>  * When adding a record to state, we remove attributes that will not be used 
> in join conditions or projected.
>  * If we use null for these attributes, we don't have to adapt the logic. If 
> we recreate rows with a smaller arity, multiple places have to be adjusted so 
> that all our index-based logic is updated and correct.
>  
> Obs: this was a even more significant problem for binary joins, since we 
> materialized all attributes for all intermediate results. However, it's also 
> relevant here. I plan to measure impacts for each of the optimizations before 
> adding them [based on a 
> benchmark|https://github.com/apache/flink-benchmarks?tab=readme-ov-file#general-remarks],
>  and we'll first merge the operator. However, I'll be documenting the 
> optimizations with tickets so we track them here. This ticket arose from a 
> discussion with [~roman] 
> [here.|https://github.com/apache/flink/pull/26313#discussion_r2105917437]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to