DanielCarter-stack commented on PR #10422:
URL: https://github.com/apache/seatunnel/pull/10422#issuecomment-3831152467

   <!-- code-pr-reviewer -->
   <!-- cpr:pr_reply_v2_parts {"group": "apache/seatunnel#10422", "part": 1, 
"total": 1} -->
   ### Issue 1: Improper Case Sensitivity Handling
   
   **Location**: 
   - `JdbcDialectTypeMapper.java:97-110`
   - `JdbcColumnConverter.java:87-98`
   - `SqlServerConnection.java:710-721`
   
   **Issue Description**:
   Using `equalsIgnoreCase` for table name and schema name comparison, but in 
some databases (such as PostgreSQL under certain configurations, Oracle, H2, 
etc.), table names are **case-sensitive**. This could lead to:
   - User queries `UserInfo` (uppercase U)
   - Database returns columns of `userinfo` (lowercase)
   - `equalsIgnoreCase` considers them equal, incorrectly including that column
   
   **Related Context**:
   - Parent/interface: `DatabaseMetaData.getColumns()` JDBC specification 
states "whether identifiers are case-sensitive depends on the database"
   - Caller: `CatalogUtils.getTableSchema()` is called in 23 database connectors
   - Impact scope: All case-sensitive databases (PostgreSQL, Oracle, H2, DB2, 
etc.)
   
   **Potential Risks**:
   - Risk 1: On case-sensitive databases, columns from wrong tables may be 
included (though probability is low)
   - Risk 2: Tests do not cover case sensitivity scenarios
   
   **Impact Scope**:
   - Direct impact: Column discovery for case-sensitive databases like 
PostgreSQL, Oracle, H2, DB2, SAP HANA, etc.
   - Indirect impact: Type inference and data conversion based on column 
information
   - Affected area: Multiple Connectors (approximately 10+ case-sensitive 
databases)
   
   **Severity**: **MAJOR**
   
   **Improvement Suggestions**:
   ```java
   // Suggestion: Consider database case sensitivity
   // Solution 1: Get information from 
DatabaseMetaData.storesLowerCaseIdentifiers()
   boolean storesLowerCase = metadata.storesLowerCaseIdentifiers();
   boolean storesUpperCase = metadata.storesUpperCaseIdentifiers();
   boolean storesMixedCase = metadata.storesMixedCaseQuotedIdentifiers();
   
   // Solution 2: Use DatabaseMetaData.identifierEqualsString() (JDBC 4.3+)
   // But need to consider compatibility
   
   // Solution 3 (current best practice): Keep equalsIgnoreCase, but add 
warning log
   if (actualTableName == null || 
!actualTableName.equalsIgnoreCase(tableNamePattern)) {
       if (actualTableName != null && 
!actualTableName.equals(tableNamePattern)) {
           log.debug("Filtered column from table '{}' while looking for '{}', 
case-sensitive databases may require exact match", 
                     actualTableName, tableNamePattern);
       }
       continue;
   }
   ```
   
   **Rationale**:
   1. JDBC specification explicitly states that case sensitivity varies across 
different databases
   2. Using `equalsIgnoreCase` is correct on case-insensitive databases (such 
as MySQL, SQL Server)
   3. But may cause errors on case-sensitive databases
   4. Adding logs can help users diagnose issues
   
   ---
   
   ### Issue 2: Missing Handling and Warning for Empty Column Lists
   
   **Location**: 
   - `JdbcDialectTypeMapper.java:84-135`
   - `JdbcColumnConverter.java:73-121`
   
   **Issue Description**:
   If all returned rows are filtered out (for example, the JDBC driver returns 
completely mismatched tables), the code will **silently return an empty list**. 
This could lead to:
   1. Upstream callers assuming the table has no columns and continuing 
processing
   2. Failing at a later stage (such as missing required columns)
   3. Unclear error messages, making diagnosis difficult
   
   **Related Context**:
   - Caller 1: `CatalogUtils.getTableSchema()` (lines 223-245)
   - Caller 2: `SqlServerConnection.getTableSchemaFromTable()` (lines 696-739)
   - These callers expect non-empty column lists to be returned
   
   **Potential Risks**:
   - Risk 1: When table doesn't exist or configuration is wrong, errors are 
delayed to later stages
   - Risk 2: Users see "table has no columns" instead of "table doesn't exist 
or columns were filtered"
   - Risk 3: Difficult to debug
   
   **Impact Scope**:
   - Direct impact: All users using JDBC CDC and JDBC Source
   - Indirect impact: Task failures, data loss
   - Affected area: All Connectors
   
   **Severity**: **MINOR**
   
   **Improvement Suggestions**:
   ```java
   // In JdbcDialectTypeMapper.java
   List<Column> columns = new ArrayList<>();
   int filteredRows = 0; // Add counter
   try (ResultSet rs = metadata.getColumns(...)) {
       while (rs.next()) {
           if (tableNamePattern != null) {
               String actualTableName = rs.getString("TABLE_NAME");
               if (actualTableName == null || 
!actualTableName.equalsIgnoreCase(tableNamePattern)) {
                   filteredRows++; // Log filtered rows
                   continue;
               }
           }
           // ... rest of logic
           columns.add(mappingColumn(typeDefine));
       }
   }
   
   // Add: If all rows are filtered, log warning
   if (columns.isEmpty() && filteredRows > 0) {
       log.warn("No columns found for table '{}', filtered {} rows from other 
tables. " +
                "The table may not exist or the driver may be returning 
unexpected results.",
                tableNamePattern, filteredRows);
   }
   return columns;
   ```
   
   **Rationale**:
   1. Helps users quickly diagnose configuration issues
   2. Doesn't affect normal flow (just logging)
   3. Low cost, high value
   
   ---
   
   ### Issue 3: Insufficient Test Coverage
   
   **Location**: `CatalogUtilsTest.java:98-168`
   
   **Issue Description**:
   The newly added test case `testGetTableSchemaFiltersOutOtherMatchedTables` 
only covers one scenario:
   - Table name contains `_` wildcard
   - Returns 2 rows (1 matching + 1 non-matching)
   
   But the following scenarios are missing:
   1. Table name contains `%` wildcard (such as `user%info`)
   2. Schema name contains wildcards (such as `pub_lic`)
   3. Both table name and schema contain wildcards
   4. Returns multiple rows, with only partial matches
   5. Returns 0 rows (table doesn't exist)
   6. Case mismatch (`UserInfo` vs `userinfo`)
   7. SqlServerConnection modifications have no corresponding unit tests
   
   **Potential Risks**:
   - Risk 1: `%` wildcard scenarios not verified
   - Risk 2: Edge cases not covered
   - Risk 3: CDC path modifications have no test protection
   
   **Impact Scope**:
   - Direct impact: Code quality assurance
   - Indirect impact: Future regressions may be introduced
   - Affected area: Single module
   
   **Severity**: **MINOR**
   
   **Improvement Suggestions**:
   ```java
   // Add new test cases
   
   @Test
   void testGetTableSchemaFiltersOutPercentageWildcard() throws SQLException {
       // Test % wildcard
       TestDatabaseMetaData metadata = new TestDatabaseMetaData() {
           @Override
           public ResultSet getColumns(...) {
               List<Map<String, Object>> value = new ArrayList<>();
               value.add(Map.of(
                   "TABLE_NAME", "user%info",
                   "TABLE_SCHEM", "public",
                   "COLUMN_NAME", "id",
                   // ...
               ));
               value.add(Map.of(
                   "TABLE_NAME", "userXYZinfo", // Should be filtered
                   "TABLE_SCHEM", "public",
                   "COLUMN_NAME", "bad",
                   // ...
               ));
               return new TestResultSet(value);
           }
       };
       
       TablePath tablePath = TablePath.of("test_db", "public", "user%info");
       TableSchema tableSchema = CatalogUtils.getTableSchema(metadata, 
tablePath, typeMapper);
       
       Assertions.assertEquals(1, tableSchema.getColumns().size());
   }
   
   @Test
   void testGetTableSchemaFiltersOutSchemaWildcard() throws SQLException {
       // Test schema contains wildcard
       // ...
   }
   
   @Test
   void testGetTableSchemaEmptyWhenAllFiltered() throws SQLException {
       // Test all rows are filtered
       // ...
   }
   ```
   
   **Rationale**:
   1. Improve code coverage
   2. Prevent future modifications from introducing regressions
   3. `%` wildcards are equally common
   
   ---
   
   ### Issue 4: Missing Unit Tests for SqlServerConnection
   
   **Location**: `SqlServerConnection.java:708-721`
   
   **Issue Description**:
   Filtering logic was added in SqlServerConnection, but **there are no 
corresponding unit tests**. This is a CDC path modification, independent from 
the JDBC path, and requires separate verification.
   
   **Potential Risks**:
   - Risk 1: CDC path modifications have no test protection
   - Risk 2: Future modifications may break CDC functionality
   - Risk 3: SQL Server users may be affected
   
   **Impact Scope**:
   - Direct impact: SQL Server CDC Connector
   - Indirect impact: Data synchronization tasks using SQL Server CDC
   - Affected area: Single Connector
   
   **Severity**: **MINOR**
   
   **Improvement Suggestions**:
   ```java
   // Add in connector-cdc-sqlserver test directory
   // File: SqlServerConnectionTest.java
   
   @Test
   void testGetTableSchemaFiltersOutWildcardTables() throws SQLException {
       // Mock DatabaseMetaData to return multiple tables
       TestDatabaseMetaData metadata = new TestDatabaseMetaData() {
           @Override
           public ResultSet getColumns(...) {
               List<Map<String, Object>> value = new ArrayList<>();
               value.add(Map.of(
                   "TABLE_NAME", "user_info",
                   "TABLE_SCHEM", "dbo",
                   "COLUMN_NAME", "id",
                   // ...
               ));
               value.add(Map.of(
                   "TABLE_NAME", "userAinfo", // Should be filtered
                   "TABLE_SCHEM", "dbo",
                   "COLUMN_NAME", "bad",
                   // ...
               ));
               return new TestResultSet(value);
           }
       };
       
       SqlServerConnection connection = new SqlServerConnection(config);
       TableId tableId = TableId.parse("dbo", "user_info");
       SqlServerChangeTable changeTable = mock(SqlServerChangeTable.class);
       when(changeTable.getSourceTableId()).thenReturn(tableId);
       
       Table table = connection.getTableSchemaFromTable("database", 
changeTable);
       
       Assertions.assertEquals(1, table.columns().size());
       Assertions.assertEquals("id", table.columns().get(0).name());
   }
   ```
   
   **Rationale**:
   1. CDC path is independent from JDBC path
   2. SQL Server is an important data source
   3. Testing cost is not high (TestDatabaseMetaData can be reused)
   
   ---
   
   ### Issue 5: Documentation and Changelog Not Updated
   
   **Location**: PR metadata
   
   **Issue Description**:
   The PR description explicitly states "Does this PR introduce any user-facing 
change? Yes", but:
   1. No relevant documentation was updated
   2. No update to `incompatible-changes.md`
   3. No update to connector-specific documentation
   
   Although this is a bug fix, it changes observed behavior, and users need to 
be aware.
   
   **Potential Risks**:
   - Risk 1: Behavior changes after upgrade but no documentation explains it
   - Risk 2: Users may depend on the old incorrect behavior
   - Risk 3: Missing reference during troubleshooting
   
   **Impact Scope**:
   - Direct impact: All JDBC and CDC Connector users
   - Indirect impact: User experience, upgrade path
   - Affected area: All Connectors
   
   **Severity**: **MINOR**
   
   **Improvement Suggestions**:
   ```
   1. 在 docs/en/connector-v2/source/jdbc.md 中添加:
      ## Known Issues
      - Table names containing `_` or `%` may have caused incorrect column 
discovery. 
        This is fixed in version X.Y.Z. If you were affected, your schema 
discovery 
        should now work correctly.
   
   2. 在 incompatible-changes.md 中添加(如果适用):
      - JDBC Column Discovery: Previously, table names with SQL LIKE wildcards 
(`_`, `%`) 
        could return columns from other tables. This is now fixed to only 
return exact 
        matches.
   ```
   
   **Rationale**:
   1. This is a user-visible behavior change
   2. Helps users understand changes after upgrade
   3. Complies with Apache project documentation standards
   
   ---


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to