Copilot commented on code in PR #4422:
URL: https://github.com/apache/flink-cdc/pull/4422#discussion_r3426155686
##########
flink-cdc-connect/flink-cdc-source-connectors/flink-connector-mysql-cdc/src/main/java/org/apache/flink/cdc/connectors/mysql/source/config/MySqlSourceConfig.java:
##########
@@ -284,6 +281,22 @@ public Predicate<TableId> getTableFilter() {
return tableId ->
tableFilters.dataCollectionFilter().isIncluded(tableId);
}
+ static Tables.TableFilter createCachedTableFilter(
+ Tables.TableFilter tableFilter, @Nullable Selectors
excludeTableFilter) {
+ Map<TableId, Boolean> tableFilterCache = new ConcurrentHashMap<>();
+ return tableId ->
+ tableFilterCache.computeIfAbsent(
+ tableId, id -> isTableIncluded(tableFilter,
excludeTableFilter, id));
Review Comment:
`ConcurrentHashMap#computeIfAbsent` may invoke the mapping function more
than once per key under concurrent access (the JDK does not guarantee single
evaluation). If `tableFilter.isIncluded(...)` /
`excludeTableFilter.isMatch(...)` are expensive or not safe to execute
redundantly/concurrently, consider a cache implementation with per-key
single-flight computation semantics (or add explicit synchronization/locking
around computation).
##########
flink-cdc-connect/flink-cdc-source-connectors/flink-connector-mysql-cdc/src/main/java/org/apache/flink/cdc/connectors/mysql/source/config/MySqlSourceConfig.java:
##########
@@ -284,6 +281,22 @@ public Predicate<TableId> getTableFilter() {
return tableId ->
tableFilters.dataCollectionFilter().isIncluded(tableId);
}
+ static Tables.TableFilter createCachedTableFilter(
+ Tables.TableFilter tableFilter, @Nullable Selectors
excludeTableFilter) {
+ Map<TableId, Boolean> tableFilterCache = new ConcurrentHashMap<>();
+ return tableId ->
+ tableFilterCache.computeIfAbsent(
+ tableId, id -> isTableIncluded(tableFilter,
excludeTableFilter, id));
+ }
Review Comment:
The `ConcurrentHashMap` cache is unbounded and can grow indefinitely on jobs
that observe many distinct `TableId`s (e.g., dynamic schemas/tenants), which
risks memory pressure over long runtimes. Consider using a bounded cache
(size/TTL) or limiting caching to a known finite table set to avoid unbounded
growth.
##########
flink-cdc-connect/flink-cdc-source-connectors/flink-connector-mysql-cdc/src/test/java/org/apache/flink/cdc/connectors/mysql/source/config/MySqlSourceConfigTest.java:
##########
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.cdc.connectors.mysql.source.config;
+
+import io.debezium.relational.TableId;
+import io.debezium.relational.Tables;
+import org.junit.jupiter.api.Test;
+
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.function.Predicate;
+
+import static org.assertj.core.api.Assertions.assertThat;
+
+/** Tests for {@link MySqlSourceConfig}. */
+class MySqlSourceConfigTest {
+
+ @Test
+ void testCachesTableFilterResults() {
+ AtomicInteger filterInvocationCount = new AtomicInteger();
+ Tables.TableFilter cachedTableFilter =
+ MySqlSourceConfig.createCachedTableFilter(
+ tableId -> {
+ filterInvocationCount.incrementAndGet();
+ return tableId.table().startsWith("orders");
+ },
+ null);
+
+ TableId includedTable = new TableId("test_db", null, "orders_1");
+ TableId unmatchedTable = new TableId("test_db", null, "customers");
+
+ assertThat(cachedTableFilter.isIncluded(includedTable)).isTrue();
+ assertThat(cachedTableFilter.isIncluded(includedTable)).isTrue();
+ assertThat(cachedTableFilter.isIncluded(unmatchedTable)).isFalse();
+ assertThat(cachedTableFilter.isIncluded(unmatchedTable)).isFalse();
+ assertThat(filterInvocationCount).hasValue(2);
+ }
+
+ @Test
+ void testTableFilterWithExcludeTableList() {
+ MySqlSourceConfig config =
+ new MySqlSourceConfigFactory()
+ .hostname("localhost")
+ .username("user")
+ .password("password")
+ .databaseList("test_db")
+ .tableList("test_db\\.orders_.*")
+ .excludeTableList("test_db.orders_skip")
Review Comment:
The include pattern escapes the `.` (`test_db\\.orders_.*`), but the exclude
pattern does not. If this is treated as a regex (as the include appears to be),
`test_db.orders_skip` will match unintended strings where `.` matches any
character. To keep the test precise and aligned with the include syntax, escape
the dot in the exclude pattern as well.
##########
flink-cdc-connect/flink-cdc-source-connectors/flink-connector-mysql-cdc/src/test/java/org/apache/flink/cdc/connectors/mysql/source/config/MySqlSourceConfigTest.java:
##########
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.cdc.connectors.mysql.source.config;
+
+import io.debezium.relational.TableId;
+import io.debezium.relational.Tables;
+import org.junit.jupiter.api.Test;
+
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.function.Predicate;
+
+import static org.assertj.core.api.Assertions.assertThat;
+
+/** Tests for {@link MySqlSourceConfig}. */
+class MySqlSourceConfigTest {
+
+ @Test
+ void testCachesTableFilterResults() {
+ AtomicInteger filterInvocationCount = new AtomicInteger();
+ Tables.TableFilter cachedTableFilter =
+ MySqlSourceConfig.createCachedTableFilter(
+ tableId -> {
+ filterInvocationCount.incrementAndGet();
+ return tableId.table().startsWith("orders");
+ },
+ null);
+
+ TableId includedTable = new TableId("test_db", null, "orders_1");
+ TableId unmatchedTable = new TableId("test_db", null, "customers");
+
+ assertThat(cachedTableFilter.isIncluded(includedTable)).isTrue();
+ assertThat(cachedTableFilter.isIncluded(includedTable)).isTrue();
+ assertThat(cachedTableFilter.isIncluded(unmatchedTable)).isFalse();
+ assertThat(cachedTableFilter.isIncluded(unmatchedTable)).isFalse();
+ assertThat(filterInvocationCount).hasValue(2);
Review Comment:
This test verifies caching only when the same `TableId` instance is reused.
Since the cache keying relies on `TableId` equality/hashing, it would be
stronger to also call `isIncluded` with a new but equal `TableId` instance
(same db/schema/table) and assert the invocation count does not increase,
ensuring caching works across equivalent objects as it will in typical call
sites.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]