DanielCarter-stack commented on PR #10488:
URL: https://github.com/apache/seatunnel/pull/10488#issuecomment-3895112022
<!-- code-pr-reviewer -->
<!-- cpr:pr_reply_v2_parts {"group": "apache/seatunnel#10488", "part": 1,
"total": 1} -->
### Issue 1: Code Duplication - DynamicChunkSplitter and FixedChunkSplitter
**Location**:
- `DynamicChunkSplitter.java:676-682`
- `FixedChunkSplitter.java:228-240`
**Issue**:
Both subclasses use identical logic when `table.getQuery()` is empty and
`whereConditionClause` is not empty:
```java
if (StringUtils.isNotBlank(config.getWhereConditionClause())) {
String userQuery = String.format(
"SELECT * FROM %s",
jdbcDialect.tableIdentifier(split.getTablePath()));
splitQuery = String.format("SELECT * FROM (%s) tmp",
applyUserWhereCondition(userQuery));
} else {
splitQuery = String.format(
"SELECT * FROM %s",
jdbcDialect.tableIdentifier(split.getTablePath()));
}
```
**Potential Risks**:
- High maintenance cost: future changes require synchronization in two places
- Easy to miss: may lead to inconsistent behavior
**Impact Scope**:
- Direct impact: `DynamicChunkSplitter`, `FixedChunkSplitter`
- Indirect impact: all users using JDBC Connector with `where_condition`
configured but `query` not configured
- Affected area: single Connector (JDBC)
**Severity**: **MINOR**
**Improvement Suggestions**:
Add a common method in the `ChunkSplitter` base class:
```java
// ChunkSplitter.java
protected String buildSimpleQueryWithWhere(TablePath tablePath) {
String baseQuery = String.format(
"SELECT * FROM %s", jdbcDialect.tableIdentifier(tablePath));
return applyUserWhereCondition(baseQuery);
}
```
Simplify subclasses to:
```java
// DynamicChunkSplitter.java & FixedChunkSplitter.java
} else {
splitQuery = buildSimpleQueryWithWhere(split.getTablePath());
splitQuery = String.format("SELECT * FROM (%s) tmp", splitQuery);
}
```
---
### Issue 2: Insufficient Exception Handling - Users Unaware When Field
Extraction Fails
**Location**: `SqlWhereConditionHelper.java:115-135`
**Issue**:
When WHERE conditions cannot be parsed (e.g., containing unsupported
syntax), the method silently returns an empty collection, and users cannot
perceive the problem.
**Potential Risks**:
- Risk 1: Users configure incorrect WHERE conditions, system fails silently,
leading to query results that don't meet expectations
- Risk 2: Difficult to troubleshoot in production environments due to lack
of error logs
**Impact Scope**:
- Direct impact: all configurations using the `where_condition` parameter
- Indirect impact: may cause data loss or return incorrect data
- Affected area: single Connector (JDBC)
**Severity**: **MAJOR**
**Improvement Suggestions**:
See the code example in the "Error Handling" section above, add:
1. try-catch to wrap parsing logic
2. Output WARN log when parsing fails
3. Return safe value (empty collection or original SQL)
---
### Issue 3: hasTopLevelSelectWildcard Method Too Long and Complex
**Location**: `SqlWhereConditionHelper.java:235-301`
**Issue**:
The method is 68 lines long and contains multi-level nested logic:
- String cleaning
- SELECT finding
- FROM finding
- DISTINCT/ALL handling
- Column splitting
- Wildcard checking
**Potential Risks**:
- Risk 1: Difficult to understand and maintain
- Risk 2: Increases bug risk (e.g., boundary condition handling)
**Impact Scope**:
- Direct impact: code readability and maintainability
- Indirect impact: new bugs may be introduced during future modifications
- Affected area: single Connector (JDBC)
**Severity**: **MINOR**
**Improvement Suggestions**:
Split into 3 private methods:
```java
private static boolean hasTopLevelSelectWildcard(String sql) {
String cleaned = sql.replaceAll("'[^']*'|\"[^\"]*\"", "");
String columnList = extractSelectColumnList(cleaned);
return containsWildcardInColumns(columnList);
}
private static String extractSelectColumnList(String sql) { /* ... */ }
private static List<String> splitColumnList(String columnList) { /* ... */ }
private static boolean containsWildcardInColumns(List<String> columns) { /*
... */ }
```
---
### Issue 4: Lack of Support Validation for Chinese/Multi-byte Field Names
**Location**: `SqlWhereConditionHelper.java:39-43`
**Issue**:
`FIELD_PATTERN` regular expression:
```java
Pattern.compile(
"([a-zA-Z_][a-zA-Z0-9_]*|`[^`]+`|\"[^\"]+\"|\\[[^\\]]+\\])\\s*"
+ "(?:=|!=|<>|<|>|<=|>=|\\s+IN\\s|\\s+BETWEEN\\s|\\s+LIKE\\s|\\s+IS\\s)",
Pattern.CASE_INSENSITIVE)
```
Only matches ASCII field names. For databases that allow Chinese, Japanese
field names (e.g., MySQL's utf8mb4 character set), extraction may not be
possible.
**Potential Risks**:
- Risk 1: Chinese WHERE condition fields cannot be automatically added
- Risk 2: Leads to SQL execution failure (field does not exist)
**Impact Scope**:
- Direct impact: databases using non-English field names (Chinese, Japanese,
Korean, etc.)
- Indirect impact: SQL execution failure, job failure
- Affected area: single Connector (JDBC)
**Severity**: **MAJOR**
**Improvement Suggestions**:
Extend regular expression to support Unicode letters:
```java
private static final Pattern FIELD_PATTERN =
Pattern.compile(
"([\\p{L}_][\\p{L}\\p{N}_]*|`[^`]+`|\"[^\"]+\"|\\[[^\\]]+\\])\\s*"
+
"(?:=|!=|<>|<|>|<=|>=|\\s+IN\\s|\\s+BETWEEN\\s|\\s+LIKE\\s|\\s+IS\\s)",
Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CHARACTER_CLASS);
```
- `\p{L}` matches letters in any language
- `\p{N}` matches numbers in any language
---
### Issue 5: Missing Test Coverage - Edge Cases and Performance Testing
**Location**: `SqlWhereConditionHelperTest.java`
**Issue**:
Test cases mainly cover normal scenarios, lacking:
1. Edge cases:
- Extra-long WHERE conditions (> 10KB)
- Deeply nested subqueries (> 5 levels)
- Large number of fields (100+ columns)
2. Performance testing:
- Parsing performance benchmarks
- Memory usage tests
3. Database dialects:
- Oracle-specific syntax (e.g., `:param` bind variables)
- MySQL-specific syntax (e.g., `@variable`)
- PostgreSQL-specific syntax (e.g., `::type` cast)
**Potential Risks**:
- Risk 1: Unpredictable behavior in production environments under extreme
conditions
- Risk 2: Performance degradation cannot be detected in time
**Impact Scope**:
- Direct impact: test coverage
- Indirect impact: production environment stability
- Affected area: single Connector (JDBC)
**Severity**: **MINOR**
**Improvement Suggestions**:
Add test cases:
```java
@Test
public void testExtractFieldNamesFromWhere_VeryLongCondition() {
StringBuilder sb = new StringBuilder();
for (int i = 0; i < 1000; i++) {
sb.append("field").append(i).append(" = ").append(i).append(" AND ");
}
sb.append("last_field = 1");
long start = System.currentTimeMillis();
Set<String> fields =
SqlWhereConditionHelper.extractFieldNamesFromWhere(sb.toString());
long duration = System.currentTimeMillis() - start;
Assertions.assertEquals(1001, fields.size());
Assertions.assertTrue(duration < 100, "Parsing should complete in <
100ms, took: " + duration);
}
@Test
public void testExtractFieldNamesFromWhere_ChineseFieldName() {
String whereCondition = "WHERE 姓名 = '张三' AND 年龄 > 18";
Set<String> fields =
SqlWhereConditionHelper.extractFieldNamesFromWhere(whereCondition);
Assertions.assertTrue(fields.contains("姓名"));
Assertions.assertTrue(fields.contains("年龄"));
}
```
---
### Issue 6: Missing Documentation - User Documentation and Migration Guide
Not Updated
**Location**: project root directory `docs/en/connectors/source/Jdbc.md`
**Issue**:
The PR modifies user-visible behavior (automatically adding fields to
SELECT), but has not updated:
1. User documentation: explaining how `where_condition` works
2. Migration guide: considerations when upgrading from older versions
3. Known limitations: restrictions on UNION/INTERSECT/EXCEPT queries
**Potential Risks**:
- Risk 1: Users confused by behavior changes
- Risk 2: Unexpected errors during upgrades
**Impact Scope**:
- Direct impact: documentation completeness
- Indirect impact: user experience
- Affected area: single Connector (JDBC)
**Severity**: **MINOR**
**Improvement Suggestions**:
Add in `docs/en/connectors/source/Jdbc.md`:
```markdown
### where_condition
The user-defined WHERE condition. The connector will automatically add
fields referenced
in the WHERE condition to the SELECT clause if they are not present.
**Example**:
```yaml
source:
Jdbc:
query: "SELECT id, name FROM users"
where_condition: "WHERE partition_date = '2023-01-01'"
```
The generated SQL will be:
```sql
SELECT * FROM (SELECT id, name, partition_date FROM users) tmp WHERE
partition_date = '2023-01-01'
```
**Limitations**:
- For UNION/INTERSECT/EXCEPT queries, fields will NOT be auto-added to avoid
column mismatch.
- Please manually include all required fields in each SELECT branch.
```
---
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]