[
https://issues.apache.org/jira/browse/FLINK-34694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17835668#comment-17835668
]
Shuai Xu edited comment on FLINK-34694 at 4/10/24 9:43 AM:
-----------------------------------------------------------
Hi [~rovboyko],
The method `otherRecordHasNoAssociationsInInputSide` in your code would be
invoked for every associatedRecord. This indeed increases the overhead of state
access. It is difficult to say which one has a greater proportion between the
increased costs and the reduced expenses of the method
'updateNumOfAssociations()' you mentioned. Intuitively, this may depend on the
data distribution itself.
So a detailed test report could better illustrate the problem. And a comparison
table that covers JOIN keyword in queries of nexmark is good. Besides, rewrite
sql for hitting this optimization can also indicate the scenarios in which this
optimization takes effect.
was (Author: JIRAUSER300096):
Hi [~rovboyko],
The method `otherRecordHasNoAssociationsInInputSide` in your code would be
invoked for every associatedRecord. This indeed increases the overhead of state
access. It is difficult to say which one has a greater proportion between the
increased costs and the reduced expenses of the method
'updateNumOfAssociations()' you mentioned. Intuitively, this may depend on the
data distribution itself.
So a detailed test report could better illustrate the problem. And a comparison
table that covers JOIN keyword in queries of nexmark is good. Besides this,
rewrite sql for hitting this optimization can also indicate the scenarios in
which this optimization takes effect.
> Delete num of associations for streaming outer join
> ---------------------------------------------------
>
> Key: FLINK-34694
> URL: https://issues.apache.org/jira/browse/FLINK-34694
> Project: Flink
> Issue Type: Improvement
> Components: Table SQL / Runtime
> Reporter: Roman Boyko
> Priority: Major
> Attachments: image-2024-03-15-19-51-29-282.png,
> image-2024-03-15-19-52-24-391.png
>
>
> Currently in StreamingJoinOperator (non-window) in case of OUTER JOIN the
> OuterJoinRecordStateView is used to store additional field - the number of
> associations for every record. This leads to store additional Tuple2 and
> Integer data for every record in outer state.
> This functionality is used only for sending:
> * -D[nullPaddingRecord] in case of first Accumulate record
> * +I[nullPaddingRecord] in case of last Revoke record
> The overhead of storing additional data and updating the counter for
> associations can be avoided by checking the input state for these events.
>
> The proposed solution can be found here -
> [https://github.com/rovboyko/flink/commit/1ca2f5bdfc2d44b99d180abb6a4dda123e49d423]
>
> According to the nexmark q20 test (changed to OUTER JOIN) it could increase
> the performance up to 20%:
> * Before:
> !image-2024-03-15-19-52-24-391.png!
> * After:
> !image-2024-03-15-19-51-29-282.png!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)