Copilot commented on code in PR #61267:
URL: https://github.com/apache/doris/pull/61267#discussion_r2923417339
##########
fe/fe-core/src/main/java/org/apache/doris/job/extensions/insert/streaming/DataSourceConfigValidator.java:
##########
@@ -43,11 +43,28 @@ public class DataSourceConfigValidator {
DataSourceConfigKeys.SSL_ROOTCERT
);
+ // Known suffixes for per-table config keys (format:
"table.<tableName>.<suffix>")
+ private static final Set<String> ALLOW_TABLE_LEVEL_SUFFIXES =
Sets.newHashSet(
+ DataSourceConfigKeys.TABLE_EXCLUDE_COLUMNS_SUFFIX
+ );
+
+ private static final String TABLE_LEVEL_PREFIX =
DataSourceConfigKeys.TABLE + ".";
+
public static void validateSource(Map<String, String> input) throws
IllegalArgumentException {
for (Map.Entry<String, String> entry : input.entrySet()) {
String key = entry.getKey();
String value = entry.getValue();
+ if (key.startsWith(TABLE_LEVEL_PREFIX)) {
+ // per-table config key: table.<tableName>.<suffix>
+ int lastDot = key.lastIndexOf('.');
+ String suffix = key.substring(lastDot + 1);
+ if (!ALLOW_TABLE_LEVEL_SUFFIXES.contains(suffix)) {
+ throw new IllegalArgumentException("Unknown per-table
config key: '" + key + "'");
+ }
+ continue;
Review Comment:
Per-table key validation only checks the final suffix after the last '.', so
keys like `table.exclude_columns` or `table..exclude_columns` (missing/empty
tableName) will incorrectly pass validation even though they can never match
any table. Consider enforcing the expected 3-part format
`table.<tableName>.<suffix>` (e.g., require at least two dots and a non-empty
tableName) and throw a clearer error when the format is invalid.
##########
fs_brokers/cdc_client/src/main/java/org/apache/doris/cdcclient/source/deserialize/DebeziumJsonDeserializer.java:
##########
@@ -93,35 +95,43 @@ public DeserializeResult deserialize(Map<String, String>
context, SourceRecord r
throws IOException {
if (RecordUtils.isDataChangeRecord(record)) {
LOG.trace("Process data change record: {}", record);
- List<String> rows = deserializeDataChangeRecord(record);
+ List<String> rows = deserializeDataChangeRecord(record, context);
return DeserializeResult.dml(rows);
} else {
return DeserializeResult.empty();
}
}
- private List<String> deserializeDataChangeRecord(SourceRecord record)
throws IOException {
+ private List<String> deserializeDataChangeRecord(
+ SourceRecord record, Map<String, String> context) throws
IOException {
List<String> rows = new ArrayList<>();
+ String tableName = extractTableName(record);
+ Set<String> excludeColumns = ConfigUtil.parseExcludeColumns(context,
tableName);
Envelope.Operation op = Envelope.operationFor(record);
Review Comment:
`parseExcludeColumns` is called for every DML record, and it
splits/allocates a new Set each time. For high-throughput CDC this adds
avoidable overhead; consider caching the parsed exclude set per table (or
precomputing once during init) and reusing it during deserialization.
##########
fs_brokers/cdc_client/src/main/java/org/apache/doris/cdcclient/utils/ConfigUtil.java:
##########
@@ -116,6 +122,31 @@ public static boolean isJson(String str) {
}
}
+ /**
+ * Parse the exclude-column set for a specific table from config.
+ *
+ * <p>Looks for key {@code "table.<tableName>.exclude_columns"} whose
value is a comma-separated
+ * column list, e.g. {@code "secret,internal_note"}.
+ *
+ * @return lower-cased column name set; empty set when the key is absent
+ */
+ public static Set<String> parseExcludeColumns(Map<String, String> config,
String tableName) {
+ String key =
+ DataSourceConfigKeys.TABLE
+ + "."
+ + tableName
+ + "."
+ + DataSourceConfigKeys.TABLE_EXCLUDE_COLUMNS_SUFFIX;
+ String value = config.get(key);
+ if (StringUtils.isEmpty(value)) {
+ return Collections.emptySet();
+ }
+ return Arrays.stream(value.split(","))
+ .map(String::trim)
+ .filter(s -> !s.isEmpty())
+ .collect(Collectors.toSet());
Review Comment:
Javadoc says this returns a "lower-cased column name set", but the
implementation preserves original case. Either normalize (e.g., toLowerCase
with a defined locale) or update the Javadoc to match actual behavior so
callers don’t assume case-insensitive matching.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]