Re: [PR] [FLINK-37371] Postgres CDC incremental source fails to handle upper-case table and column names [flink-cdc]

via GitHub Thu, 13 Mar 2025 10:01:22 -0700


morozov commented on code in PR #3929:
URL: https://github.com/apache/flink-cdc/pull/3929#discussion_r1993805772



##########
flink-cdc-connect/flink-cdc-source-connectors/flink-connector-postgres-cdc/src/main/java/org/apache/flink/cdc/connectors/postgres/source/utils/PostgresQueryUtils.java:
##########
@@ -69,12 +69,17 @@ public static long queryApproximateRowCnt(JdbcConnection 
jdbc, TableId tableId)
         // 
https://stackoverflow.com/questions/7943233/fast-way-to-discover-the-row-count-of-a-table-in-postgresql
         // NOTE: it requires ANALYZE or VACUUM to be run first in PostgreSQL.
         final String query =
-                String.format(
-                        "SELECT reltuples::bigint FROM pg_class WHERE oid = 
to_regclass('%s')",
-                        tableId.toString());
-
-        return jdbc.queryAndMap(
+                "SELECT reltuples::bigint"
+                        + " FROM pg_class c"
+                        + " JOIN pg_namespace n ON n.oid = c.relnamespace"

Review Comment:
   This is part of the bug fix. Let's assume we have a table named `Customers` 
in the schema named `public`.
   
   The previous logic would work like this:
   1. Use `TableId#toString()` to build the qualified name from the schema name 
and table name: `public.Customers`.
   2. Use `to_regclass()` to parse the qualified name and find the table OID.
   3. Fetch the number of rows in the table by OID.
   
   
   The problem is that the value passed to `to_regclass()` is parsed as a 
Postgres SQL expression but isn't formatted as such. It doesn't account for 
case insensitivity of unquoted identifiers (see the 
[documentation](https://www.postgresql.org/docs/current/sql-syntax-lexical.html)):
   
   > Quoting an identifier also makes it case-sensitive, whereas unquoted names 
are always folded to lower case. For example, the identifiers FOO, foo, and 
"foo" are considered the same by PostgreSQL, but "Foo" and "FOO" are different 
from these three and each other.
   
   For the original query to work, it should have looked like:
   ```sql
   SELECT reltuples::bigint FROM pg_class WHERE oid = 
to_regclass('public."Customers"')
   ```
   So it's effectively SQL in SQL.
   
   The new query is much more straightforward and isn't prone to this issue. It 
just selects the number of rows for the table whose schema name is `public` and 
name is `Customers` by avoiding the "SQL in SQL" situation.
   
   It is possible to make the old query work, but it would involve building SQL 
by using `quote()` and then parsing via `to_regclass()` which is redundant (the 
first two steps in the original query effectively eliminate each other).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [FLINK-37371] Postgres CDC incremental source fails to handle upper-case table and column names [flink-cdc]

Reply via email to