[
https://issues.apache.org/jira/browse/FLINK-38555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ruan Hang reassigned FLINK-38555:
---------------------------------
Assignee: yuanfenghu (was: Kiruban Kamaraj)
> Optimize performance of `RecordUtils.compareObjects()` method by avoiding
> unnecessary `toString()` calls for temporal types (LocalDateTime, LocalDate,
> Instant, etc.).
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: FLINK-38555
> URL: https://issues.apache.org/jira/browse/FLINK-38555
> Project: Flink
> Issue Type: Bug
> Components: Flink CDC
> Affects Versions: cdc-3.5.0
> Reporter: yuanfenghu
> Assignee: yuanfenghu
> Priority: Critical
> Fix For: cdc-3.6.0
>
> Attachments: image-2025-10-24-10-15-18-027.png,
> image-2025-10-24-10-15-37-328.png
>
>
> h2. Background
> While analyzing flame graphs of a Flink CDC MySQL source job, I identified
> that `RecordUtils.splitKeyRangeContains()` was a performance bottleneck.
> Further investigation revealed that `compareObjects()` was using `toString()`
> to compare temporal objects, which is significantly slower than direct
> comparison.
>
> h3. Root Cause
> h3.
> In the current implementation:
> {code:java}
> private static int compareObjects(Object o1, Object o2) {
> if (o1 instanceof Comparable && o1.getClass().equals(o2.getClass())) {
> return ((Comparable) o1).compareTo(o2);
> } else if (isNumericObject(o1) && isNumericObject(o2)) {
> return toBigDecimal(o1).compareTo(toBigDecimal(o2));
> } else {
> return o1.toString().compareTo(o2.toString());
> }
> }{code}
> When comparing `LocalDateTime` objects, the first condition fails if the
> objects are cast to `Object`, falling through to the `toString()` comparison
> path.
> h3. Impact
> This method is called extensively during the snapshot phase when evaluating
> whether binlog records fall within completed split ranges. For tables with:
> - Temporal types (DATETIME, TIMESTAMP, DATE, TIME) as chunk keys
> - High binlog throughput during snapshot phase
> - Many splits (large tables with small chunk size)
> The performance impact can be significant (80% CPU in some cases).
> !image-2025-10-24-10-15-37-328.png!
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)