github-actions[bot] commented on code in PR #64511:
URL: https://github.com/apache/doris/pull/64511#discussion_r3418793146


##########
fs_brokers/cdc_client/src/main/java/org/apache/doris/cdcclient/utils/ConfigUtil.java:
##########
@@ -123,9 +124,35 @@ public static ZoneId 
getPostgresServerTimeZoneFromProps(java.util.Properties pro
         return ZoneId.systemDefault();
     }
 
+    public static final String MAX_QUEUE_BYTES_SYS_PROP = 
"cdc.max.queue.size.in.bytes";
+
+    // Heap-adaptive byte cap for the debezium ChangeEventQueue buffer.
+    // heap 1G->64MB, 2G->128MB, >=4G->256MB. -D<MAX_QUEUE_BYTES_SYS_PROP> 
overrides
+    // (<=0 disables); a malformed override is logged and ignored, falling 
back to the cap.
+    private static long resolveMaxQueueSizeInBytes() {
+        String override = System.getProperty(MAX_QUEUE_BYTES_SYS_PROP);
+        if (override != null) {
+            try {
+                long bytes = Long.parseLong(override.trim());
+                return bytes <= 0 ? 0 : bytes;
+            } catch (NumberFormatException e) {
+                LOG.warn(
+                        "Ignoring invalid -D{}={}, expected an integer byte 
count; "
+                                + "falling back to the adaptive cap",
+                        MAX_QUEUE_BYTES_SYS_PROP,
+                        override);
+            }
+        }
+        long target = Runtime.getRuntime().maxMemory() / 16;
+        return Math.max(64L * 1024 * 1024, Math.min(target, 256L * 1024 * 
1024));

Review Comment:
   This only caps Debezium's `ChangeEventQueue`; the exact snapshot-backfill 
path still drains that queue into 
`IncrementalSourceScanFetcher.pollWithBuffer()`'s `outputBuffer` until the 
split reaches its high-watermark/end-watermark. For the TVF/default snapshot 
path (`skip_snapshot_backfill` absent, so false), a split with 8192 rows at 
~2MB each will be polled in several <=64-256MB queue chunks, but all ~16GB can 
still accumulate in the `HashMap` before any records are returned, and 
`snapshot_parallelism` can multiply that. That leaves the wide-row snapshot OOM 
scenario described by the PR unresolved. Please either enforce a byte bound in 
the snapshot output buffer/split sizing, or scope this cap to paths where 
records are streamed out instead of fully buffered.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to