[
https://issues.apache.org/jira/browse/FLINK-35874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17867924#comment-17867924
]
Zhongmin Qiao commented on FLINK-35874:
---------------------------------------
Assign this pr to me please. https://github.com/apache/flink-cdc/pull/3488
PTAL [~Leonard], [~ruanhang1993]
> Check pureBinlogPhaseTables set before call getBinlogPosition method in
> BinlogSplitReader
> -----------------------------------------------------------------------------------------
>
> Key: FLINK-35874
> URL: https://issues.apache.org/jira/browse/FLINK-35874
> Project: Flink
> Issue Type: Improvement
> Components: Flink CDC
> Reporter: Zhongmin Qiao
> Priority: Minor
> Attachments: image-2024-07-22-19-26-59-158.png,
> image-2024-07-22-19-27-19-366.png, image-2024-07-22-19-30-08-989.png,
> image-2024-07-22-19-36-20-481.png, image-2024-07-22-19-36-40-581.png,
> image-2024-07-22-19-37-35-542.png, image-2024-07-22-21-12-03-316.png
>
>
> The method getBinlogPosition of RecordUtil which is called by
> BinlogSplitReader.
> shouldEmit is a highly performance-consuming method. This is because it
> iterates through the sourceOffset map of the SourceRecord, and during the
> iteration, it also performs a toString() conversion on the value. Finally, it
> calls the putAll method of BinlogOffsetBuilder to put all the elements
> obtained from the iteration into the offsetMap (which involves another map
> traversal and hashcode computation). Despite the significant performance
> impact of getBinlogPosition, we still need to call it when emitting each
> DataChangeRecord, which reduces the efficiency of data processing in Flink
> CDC.
> !image-2024-07-22-19-26-59-158.png|width=545,height=222!
> !image-2024-07-22-19-27-19-366.png|width=545,height=119!
> However, we can optimize and avoid frequent invocations of getBinlogPosition
> by moving the check pureBinlogPhaseTables.contains(tableId) in the
> hasEnterPureBinlogPhase method before calling getBinlogPosition. This way, if
> the SourceRecord belongs to a pure binlog phase table, we can directly return
> true without the need for the highly performance-consuming getBinlogPosition
> method.
> diff
> !image-2024-07-22-21-12-03-316.png|width=548,height=236!
>
>
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)