taoran92 opened a new pull request, #4422:
URL: https://github.com/apache/flink-cdc/pull/4422
# What is the purpose of the change
This PR reduces high CPU usage in MySQL CDC source when synchronizing a
large number of tables.
In large-table scenarios, MySQL binlog event processing may repeatedly
check whether the same TableId should be included by the configured table
filters. The hot path goes through Debezium's table filter
predicates, which rely on regex matching:
java.util.regex.Matcher.match
java.util.regex.Matcher.matches
io.debezium.relational.RelationalTableFilters
io.debezium.connector.mysql.MySqlStreamingChangeEventSource.informAboutUnknownTableIfRequired
When the table list is large or the regex patterns are complex, repeatedly
evaluating the same table filter result can consume significant CPU and cause
TaskManager CPU usage to stay close to 100%.
This PR caches the table filter result by TableId after constructing the
Debezium table filter. The cached filter preserves the existing semantics of
the Debezium include filter and Flink CDC excludeTableList,
while avoiding repeated regex evaluation for the same table.
# Brief change log
- Cache MySQL CDC table filter results by TableId in MySqlSourceConfig
- Preserve existing include/exclude table filter semantics when using the
cached filter
- Add unit tests to verify repeated checks for the same table reuse the
cached result
- Add unit tests to verify excludeTableList behavior is unchanged
# Verifying this change
This change is verified by unit tests:
- MySqlSourceConfigTest#testCachesTableFilterResults
- MySqlSourceConfigTest#testTableFilterWithExcludeTableList
Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): no
- The public API, i.e. is any changed class annotated with
@Public(@PublicEvolving): no
- The serializers: no
- The runtime per-record code paths (performance sensitive): yes
- Anything that affects deployment or recovery: no
# Documentation
Does this pull request introduce a new feature? no
If yes, how is the feature documented? not applicable
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]