ThorneANN opened a new pull request, #4403:
URL: https://github.com/apache/flink-cdc/pull/4403
、 CustomPostgresSchema#readTableSchema invokes jdbcConnection.readSchema with
the full captured-table filter, so a single call already loads metadata for
every captured table. However the cache-population loop only iterates the
requested subset, discarding the rest. As a result, snapshot startup
performs
one full pg_catalog scan per split, scaling as O(N²) with the number of
captured tables and causing severe latency on multi-tenant Postgres
deployments
that capture hundreds of tables across schemas.
This change caches every table discovered by readSchema into
schemasByTableId,
while the returned tableChanges still contains only the
originally-requested
subset. Subsequent splits are served entirely from the cache.
Also fixes a related issue where getTableSchema(List<TableId>) re-fetched
already-cached tables by passing the full tableIds list to readTableSchema
instead of the unmatched subset.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]