[ 
https://issues.apache.org/jira/browse/FLINK-38555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruan Hang reassigned FLINK-38555:
---------------------------------

    Assignee: Kiruban Kamaraj

> Optimize performance of `RecordUtils.compareObjects()` method by avoiding 
> unnecessary `toString()` calls for temporal types (LocalDateTime, LocalDate, 
> Instant, etc.).
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-38555
>                 URL: https://issues.apache.org/jira/browse/FLINK-38555
>             Project: Flink
>          Issue Type: Bug
>          Components: Flink CDC
>    Affects Versions: cdc-3.5.0
>            Reporter: yuanfenghu
>            Assignee: Kiruban Kamaraj
>            Priority: Critical
>             Fix For: cdc-3.6.0
>
>         Attachments: image-2025-10-24-10-15-18-027.png, 
> image-2025-10-24-10-15-37-328.png
>
>
> h2.  Background
> While analyzing flame graphs of a Flink CDC MySQL source job, I identified 
> that `RecordUtils.splitKeyRangeContains()` was a performance bottleneck. 
> Further investigation revealed that `compareObjects()` was using `toString()` 
> to compare temporal objects, which is significantly slower than direct 
> comparison.
>  
> h3. Root Cause
> h3. 
> In the current implementation:
> {code:java}
> private static int compareObjects(Object o1, Object o2) {
>     if (o1 instanceof Comparable && o1.getClass().equals(o2.getClass())) {
>         return ((Comparable) o1).compareTo(o2);
>     } else if (isNumericObject(o1) && isNumericObject(o2)) {
>         return toBigDecimal(o1).compareTo(toBigDecimal(o2));
>     } else {
>         return o1.toString().compareTo(o2.toString());
>     }
> }{code}
> When comparing `LocalDateTime` objects, the first condition fails if the 
> objects are cast to `Object`, falling through to the `toString()` comparison 
> path.
> h3. Impact
> This method is called extensively during the snapshot phase when evaluating 
> whether binlog records fall within completed split ranges. For tables with:
>  - Temporal types (DATETIME, TIMESTAMP, DATE, TIME) as chunk keys
>  - High binlog throughput during snapshot phase
>  - Many splits (large tables with small chunk size)
> The performance impact can be significant (80% CPU in some cases).
> !image-2025-10-24-10-15-37-328.png!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to