[ 
https://issues.apache.org/jira/browse/FLINK-34694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17835648#comment-17835648
 ] 

Shuai Xu commented on FLINK-34694:
----------------------------------

Hi [~rovboyko] , your idea looks interesting. Actually I found that this 
optimization does not  reduce the overhead of state access after reading your 
code rather reduces the state to some extent . IMO, the marginal reduction in 
size may not significantly impact the overhead of storage, given that it 
constitutes a small fraction relative to the records held in the state.

BTW, if you plan to pursue this optimization further, could you provide more 
comprehensive benchmark details? The benchmark results of multiple tests and 
overall performance of all queries are convincing.

> Delete num of associations for streaming outer join
> ---------------------------------------------------
>
>                 Key: FLINK-34694
>                 URL: https://issues.apache.org/jira/browse/FLINK-34694
>             Project: Flink
>          Issue Type: Improvement
>          Components: Table SQL / Runtime
>            Reporter: Roman Boyko
>            Priority: Major
>         Attachments: image-2024-03-15-19-51-29-282.png, 
> image-2024-03-15-19-52-24-391.png
>
>
> Currently in StreamingJoinOperator (non-window) in case of OUTER JOIN the 
> OuterJoinRecordStateView is used to store additional field - the number of 
> associations for every record. This leads to store additional Tuple2 and 
> Integer data for every record in outer state.
> This functionality is used only for sending:
>  * -D[nullPaddingRecord] in case of first Accumulate record
>  * +I[nullPaddingRecord] in case of last Revoke record
> The overhead of storing additional data and updating the counter for 
> associations can be avoided by checking the input state for these events.
>  
> The proposed solution can be found here - 
> [https://github.com/rovboyko/flink/commit/1ca2f5bdfc2d44b99d180abb6a4dda123e49d423]
>  
> According to the nexmark q20 test (changed to OUTER JOIN) it could increase 
> the performance up to 20%:
>  * Before:
> !image-2024-03-15-19-52-24-391.png!
>  * After:
> !image-2024-03-15-19-51-29-282.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to