DanielCarter-stack commented on PR #10422:
URL: https://github.com/apache/seatunnel/pull/10422#issuecomment-3831152467
<!-- code-pr-reviewer -->
<!-- cpr:pr_reply_v2_parts {"group": "apache/seatunnel#10422", "part": 1,
"total": 1} -->
### Issue 1: Improper Case Sensitivity Handling
**Location**:
- `JdbcDialectTypeMapper.java:97-110`
- `JdbcColumnConverter.java:87-98`
- `SqlServerConnection.java:710-721`
**Issue Description**:
Using `equalsIgnoreCase` for table name and schema name comparison, but in
some databases (such as PostgreSQL under certain configurations, Oracle, H2,
etc.), table names are **case-sensitive**. This could lead to:
- User queries `UserInfo` (uppercase U)
- Database returns columns of `userinfo` (lowercase)
- `equalsIgnoreCase` considers them equal, incorrectly including that column
**Related Context**:
- Parent/interface: `DatabaseMetaData.getColumns()` JDBC specification
states "whether identifiers are case-sensitive depends on the database"
- Caller: `CatalogUtils.getTableSchema()` is called in 23 database connectors
- Impact scope: All case-sensitive databases (PostgreSQL, Oracle, H2, DB2,
etc.)
**Potential Risks**:
- Risk 1: On case-sensitive databases, columns from wrong tables may be
included (though probability is low)
- Risk 2: Tests do not cover case sensitivity scenarios
**Impact Scope**:
- Direct impact: Column discovery for case-sensitive databases like
PostgreSQL, Oracle, H2, DB2, SAP HANA, etc.
- Indirect impact: Type inference and data conversion based on column
information
- Affected area: Multiple Connectors (approximately 10+ case-sensitive
databases)
**Severity**: **MAJOR**
**Improvement Suggestions**:
```java
// Suggestion: Consider database case sensitivity
// Solution 1: Get information from
DatabaseMetaData.storesLowerCaseIdentifiers()
boolean storesLowerCase = metadata.storesLowerCaseIdentifiers();
boolean storesUpperCase = metadata.storesUpperCaseIdentifiers();
boolean storesMixedCase = metadata.storesMixedCaseQuotedIdentifiers();
// Solution 2: Use DatabaseMetaData.identifierEqualsString() (JDBC 4.3+)
// But need to consider compatibility
// Solution 3 (current best practice): Keep equalsIgnoreCase, but add
warning log
if (actualTableName == null ||
!actualTableName.equalsIgnoreCase(tableNamePattern)) {
if (actualTableName != null &&
!actualTableName.equals(tableNamePattern)) {
log.debug("Filtered column from table '{}' while looking for '{}',
case-sensitive databases may require exact match",
actualTableName, tableNamePattern);
}
continue;
}
```
**Rationale**:
1. JDBC specification explicitly states that case sensitivity varies across
different databases
2. Using `equalsIgnoreCase` is correct on case-insensitive databases (such
as MySQL, SQL Server)
3. But may cause errors on case-sensitive databases
4. Adding logs can help users diagnose issues
---
### Issue 2: Missing Handling and Warning for Empty Column Lists
**Location**:
- `JdbcDialectTypeMapper.java:84-135`
- `JdbcColumnConverter.java:73-121`
**Issue Description**:
If all returned rows are filtered out (for example, the JDBC driver returns
completely mismatched tables), the code will **silently return an empty list**.
This could lead to:
1. Upstream callers assuming the table has no columns and continuing
processing
2. Failing at a later stage (such as missing required columns)
3. Unclear error messages, making diagnosis difficult
**Related Context**:
- Caller 1: `CatalogUtils.getTableSchema()` (lines 223-245)
- Caller 2: `SqlServerConnection.getTableSchemaFromTable()` (lines 696-739)
- These callers expect non-empty column lists to be returned
**Potential Risks**:
- Risk 1: When table doesn't exist or configuration is wrong, errors are
delayed to later stages
- Risk 2: Users see "table has no columns" instead of "table doesn't exist
or columns were filtered"
- Risk 3: Difficult to debug
**Impact Scope**:
- Direct impact: All users using JDBC CDC and JDBC Source
- Indirect impact: Task failures, data loss
- Affected area: All Connectors
**Severity**: **MINOR**
**Improvement Suggestions**:
```java
// In JdbcDialectTypeMapper.java
List<Column> columns = new ArrayList<>();
int filteredRows = 0; // Add counter
try (ResultSet rs = metadata.getColumns(...)) {
while (rs.next()) {
if (tableNamePattern != null) {
String actualTableName = rs.getString("TABLE_NAME");
if (actualTableName == null ||
!actualTableName.equalsIgnoreCase(tableNamePattern)) {
filteredRows++; // Log filtered rows
continue;
}
}
// ... rest of logic
columns.add(mappingColumn(typeDefine));
}
}
// Add: If all rows are filtered, log warning
if (columns.isEmpty() && filteredRows > 0) {
log.warn("No columns found for table '{}', filtered {} rows from other
tables. " +
"The table may not exist or the driver may be returning
unexpected results.",
tableNamePattern, filteredRows);
}
return columns;
```
**Rationale**:
1. Helps users quickly diagnose configuration issues
2. Doesn't affect normal flow (just logging)
3. Low cost, high value
---
### Issue 3: Insufficient Test Coverage
**Location**: `CatalogUtilsTest.java:98-168`
**Issue Description**:
The newly added test case `testGetTableSchemaFiltersOutOtherMatchedTables`
only covers one scenario:
- Table name contains `_` wildcard
- Returns 2 rows (1 matching + 1 non-matching)
But the following scenarios are missing:
1. Table name contains `%` wildcard (such as `user%info`)
2. Schema name contains wildcards (such as `pub_lic`)
3. Both table name and schema contain wildcards
4. Returns multiple rows, with only partial matches
5. Returns 0 rows (table doesn't exist)
6. Case mismatch (`UserInfo` vs `userinfo`)
7. SqlServerConnection modifications have no corresponding unit tests
**Potential Risks**:
- Risk 1: `%` wildcard scenarios not verified
- Risk 2: Edge cases not covered
- Risk 3: CDC path modifications have no test protection
**Impact Scope**:
- Direct impact: Code quality assurance
- Indirect impact: Future regressions may be introduced
- Affected area: Single module
**Severity**: **MINOR**
**Improvement Suggestions**:
```java
// Add new test cases
@Test
void testGetTableSchemaFiltersOutPercentageWildcard() throws SQLException {
// Test % wildcard
TestDatabaseMetaData metadata = new TestDatabaseMetaData() {
@Override
public ResultSet getColumns(...) {
List<Map<String, Object>> value = new ArrayList<>();
value.add(Map.of(
"TABLE_NAME", "user%info",
"TABLE_SCHEM", "public",
"COLUMN_NAME", "id",
// ...
));
value.add(Map.of(
"TABLE_NAME", "userXYZinfo", // Should be filtered
"TABLE_SCHEM", "public",
"COLUMN_NAME", "bad",
// ...
));
return new TestResultSet(value);
}
};
TablePath tablePath = TablePath.of("test_db", "public", "user%info");
TableSchema tableSchema = CatalogUtils.getTableSchema(metadata,
tablePath, typeMapper);
Assertions.assertEquals(1, tableSchema.getColumns().size());
}
@Test
void testGetTableSchemaFiltersOutSchemaWildcard() throws SQLException {
// Test schema contains wildcard
// ...
}
@Test
void testGetTableSchemaEmptyWhenAllFiltered() throws SQLException {
// Test all rows are filtered
// ...
}
```
**Rationale**:
1. Improve code coverage
2. Prevent future modifications from introducing regressions
3. `%` wildcards are equally common
---
### Issue 4: Missing Unit Tests for SqlServerConnection
**Location**: `SqlServerConnection.java:708-721`
**Issue Description**:
Filtering logic was added in SqlServerConnection, but **there are no
corresponding unit tests**. This is a CDC path modification, independent from
the JDBC path, and requires separate verification.
**Potential Risks**:
- Risk 1: CDC path modifications have no test protection
- Risk 2: Future modifications may break CDC functionality
- Risk 3: SQL Server users may be affected
**Impact Scope**:
- Direct impact: SQL Server CDC Connector
- Indirect impact: Data synchronization tasks using SQL Server CDC
- Affected area: Single Connector
**Severity**: **MINOR**
**Improvement Suggestions**:
```java
// Add in connector-cdc-sqlserver test directory
// File: SqlServerConnectionTest.java
@Test
void testGetTableSchemaFiltersOutWildcardTables() throws SQLException {
// Mock DatabaseMetaData to return multiple tables
TestDatabaseMetaData metadata = new TestDatabaseMetaData() {
@Override
public ResultSet getColumns(...) {
List<Map<String, Object>> value = new ArrayList<>();
value.add(Map.of(
"TABLE_NAME", "user_info",
"TABLE_SCHEM", "dbo",
"COLUMN_NAME", "id",
// ...
));
value.add(Map.of(
"TABLE_NAME", "userAinfo", // Should be filtered
"TABLE_SCHEM", "dbo",
"COLUMN_NAME", "bad",
// ...
));
return new TestResultSet(value);
}
};
SqlServerConnection connection = new SqlServerConnection(config);
TableId tableId = TableId.parse("dbo", "user_info");
SqlServerChangeTable changeTable = mock(SqlServerChangeTable.class);
when(changeTable.getSourceTableId()).thenReturn(tableId);
Table table = connection.getTableSchemaFromTable("database",
changeTable);
Assertions.assertEquals(1, table.columns().size());
Assertions.assertEquals("id", table.columns().get(0).name());
}
```
**Rationale**:
1. CDC path is independent from JDBC path
2. SQL Server is an important data source
3. Testing cost is not high (TestDatabaseMetaData can be reused)
---
### Issue 5: Documentation and Changelog Not Updated
**Location**: PR metadata
**Issue Description**:
The PR description explicitly states "Does this PR introduce any user-facing
change? Yes", but:
1. No relevant documentation was updated
2. No update to `incompatible-changes.md`
3. No update to connector-specific documentation
Although this is a bug fix, it changes observed behavior, and users need to
be aware.
**Potential Risks**:
- Risk 1: Behavior changes after upgrade but no documentation explains it
- Risk 2: Users may depend on the old incorrect behavior
- Risk 3: Missing reference during troubleshooting
**Impact Scope**:
- Direct impact: All JDBC and CDC Connector users
- Indirect impact: User experience, upgrade path
- Affected area: All Connectors
**Severity**: **MINOR**
**Improvement Suggestions**:
```
1. 在 docs/en/connector-v2/source/jdbc.md 中添加:
## Known Issues
- Table names containing `_` or `%` may have caused incorrect column
discovery.
This is fixed in version X.Y.Z. If you were affected, your schema
discovery
should now work correctly.
2. 在 incompatible-changes.md 中添加(如果适用):
- JDBC Column Discovery: Previously, table names with SQL LIKE wildcards
(`_`, `%`)
could return columns from other tables. This is now fixed to only
return exact
matches.
```
**Rationale**:
1. This is a user-visible behavior change
2. Helps users understand changes after upgrade
3. Complies with Apache project documentation standards
---
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]