Re: [PR] [Improve](Streamingjob) support exclude_columns for Postgres streaming job [doris]

via GitHub Thu, 12 Mar 2026 02:43:27 -0700


Copilot commented on code in PR #61267:
URL: https://github.com/apache/doris/pull/61267#discussion_r2923417339



##########
fe/fe-core/src/main/java/org/apache/doris/job/extensions/insert/streaming/DataSourceConfigValidator.java:
##########
@@ -43,11 +43,28 @@ public class DataSourceConfigValidator {
             DataSourceConfigKeys.SSL_ROOTCERT
     );
 
+    // Known suffixes for per-table config keys (format: 
"table.<tableName>.<suffix>")
+    private static final Set<String> ALLOW_TABLE_LEVEL_SUFFIXES = 
Sets.newHashSet(
+            DataSourceConfigKeys.TABLE_EXCLUDE_COLUMNS_SUFFIX
+    );
+
+    private static final String TABLE_LEVEL_PREFIX = 
DataSourceConfigKeys.TABLE + ".";
+
     public static void validateSource(Map<String, String> input) throws 
IllegalArgumentException {
         for (Map.Entry<String, String> entry : input.entrySet()) {
             String key = entry.getKey();
             String value = entry.getValue();
 
+            if (key.startsWith(TABLE_LEVEL_PREFIX)) {
+                // per-table config key: table.<tableName>.<suffix>
+                int lastDot = key.lastIndexOf('.');
+                String suffix = key.substring(lastDot + 1);
+                if (!ALLOW_TABLE_LEVEL_SUFFIXES.contains(suffix)) {
+                    throw new IllegalArgumentException("Unknown per-table 
config key: '" + key + "'");
+                }
+                continue;

Review Comment:
   Per-table key validation only checks the final suffix after the last '.', so 
keys like `table.exclude_columns` or `table..exclude_columns` (missing/empty 
tableName) will incorrectly pass validation even though they can never match 
any table. Consider enforcing the expected 3-part format 
`table.<tableName>.<suffix>` (e.g., require at least two dots and a non-empty 
tableName) and throw a clearer error when the format is invalid.



##########
fs_brokers/cdc_client/src/main/java/org/apache/doris/cdcclient/source/deserialize/DebeziumJsonDeserializer.java:
##########
@@ -93,35 +95,43 @@ public DeserializeResult deserialize(Map<String, String> 
context, SourceRecord r
             throws IOException {
         if (RecordUtils.isDataChangeRecord(record)) {
             LOG.trace("Process data change record: {}", record);
-            List<String> rows = deserializeDataChangeRecord(record);
+            List<String> rows = deserializeDataChangeRecord(record, context);
             return DeserializeResult.dml(rows);
         } else {
             return DeserializeResult.empty();
         }
     }
 
-    private List<String> deserializeDataChangeRecord(SourceRecord record) 
throws IOException {
+    private List<String> deserializeDataChangeRecord(
+            SourceRecord record, Map<String, String> context) throws 
IOException {
         List<String> rows = new ArrayList<>();
+        String tableName = extractTableName(record);
+        Set<String> excludeColumns = ConfigUtil.parseExcludeColumns(context, 
tableName);
         Envelope.Operation op = Envelope.operationFor(record);

Review Comment:
   `parseExcludeColumns` is called for every DML record, and it 
splits/allocates a new Set each time. For high-throughput CDC this adds 
avoidable overhead; consider caching the parsed exclude set per table (or 
precomputing once during init) and reusing it during deserialization.



##########
fs_brokers/cdc_client/src/main/java/org/apache/doris/cdcclient/utils/ConfigUtil.java:
##########
@@ -116,6 +122,31 @@ public static boolean isJson(String str) {
         }
     }
 
+    /**
+     * Parse the exclude-column set for a specific table from config.
+     *
+     * <p>Looks for key {@code "table.<tableName>.exclude_columns"} whose 
value is a comma-separated
+     * column list, e.g. {@code "secret,internal_note"}.
+     *
+     * @return lower-cased column name set; empty set when the key is absent
+     */
+    public static Set<String> parseExcludeColumns(Map<String, String> config, 
String tableName) {
+        String key =
+                DataSourceConfigKeys.TABLE
+                        + "."
+                        + tableName
+                        + "."
+                        + DataSourceConfigKeys.TABLE_EXCLUDE_COLUMNS_SUFFIX;
+        String value = config.get(key);
+        if (StringUtils.isEmpty(value)) {
+            return Collections.emptySet();
+        }
+        return Arrays.stream(value.split(","))
+                .map(String::trim)
+                .filter(s -> !s.isEmpty())
+                .collect(Collectors.toSet());

Review Comment:
   Javadoc says this returns a "lower-cased column name set", but the 
implementation preserves original case. Either normalize (e.g., toLowerCase 
with a defined locale) or update the Javadoc to match actual behavior so 
callers don’t assume case-insensitive matching.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [Improve](Streamingjob) support exclude_columns for Postgres streaming job [doris]

Reply via email to