DanielCarter-stack commented on PR #10629:
URL: https://github.com/apache/seatunnel/pull/10629#issuecomment-4095445044
<!-- code-pr-reviewer -->
<!-- cpr:pr_reply_v2_parts {"group": "apache/seatunnel#10629", "part": 1,
"total": 1} -->
### Issue 1: Lack of resource cleanup verification in exception scenarios
**Location**: `StarRocksCatalog.java` Global (all modified methods)
**Context**:
- Modified class: `StarRocksCatalog.java`
- Callers: `DefaultSaveModeHandler`, `SchemaEvolutionHandler`
- Related interfaces: `org.apache.seatunnel.api.table.catalog.Catalog`
**Problem Description**:
Although the code changes correctly use try-with-resources, there is no
verification of exception scenarios. If an exception is thrown after
`PreparedStatement` is created but before `ResultSet` is obtained (for example,
if `getMetaData()` throws `SQLException`), can we guarantee that the Statement
is closed?
According to the Java Language Specification, try-with-resources guarantees
that the resource's `close()` method is called when the try block exits
(whether normally or exceptionally). Therefore, it is theoretically safe, but
lacks test verification.
**Potential Risks**:
- Risk 1: In certain extreme JVM implementations or JDBC drivers, there may
be edge cases where resources are not properly closed
- Risk 2: If someone refactors the code in the future, they may break the
try-with-resources structure (for example, by returning an object within the
try block)
**Impact Scope**:
- Direct impact: All scenarios using Catalog (Save Mode, Schema Evolution)
- Indirect impact: Long-running jobs may accumulate resource leaks (if the
fix is incomplete)
- Affected area: Single Connector (StarRocks)
**Severity**: MINOR
**Improvement Suggestions**:
Add unit tests to verify resource cleanup in exception scenarios:
```java
@Test
public void testStatementClosedOnException() throws SQLException {
// Mock Connection that throws exception on prepareStatement
Connection mockConn = mock(Connection.class);
PreparedStatement mockStmt = mock(PreparedStatement.class);
when(mockConn.prepareStatement(anyString()))
.thenThrow(new SQLException("Test exception"));
StarRocksCatalog catalog = new StarRocksCatalog(...);
// Inject mock connection (need to add package-visible or protected
setter)
catalog.setConnection(mockConn);
try {
catalog.getTable(TablePath.of("db", "table"));
fail("Expected exception");
} catch (CatalogException e) {
// Expected
}
// Verify that PreparedStatement.close() was called even though
exception was thrown
verify(mockStmt).close();
}
```
Or use resource leak detection tools in integration tests (such as Java
Flight Recorder or Valgrind).
**Rationale**: Although try-with-resources theoretically guarantees resource
cleanup, test verification can prevent future regressions, especially for
"invisible" bugs like resource leaks.
---
### Issue 2: SQL concatenation in getPrimaryKey() can be optimized to
parameterized query
**Location**: `StarRocksCatalog.java:464-466`
**Related Context**:
- Method: `getPrimaryKey(String schema, String table)`
- Caller: `getTable()` (lines 153-154)
- Query table: `information_schema.columns`
**Problem Description**:
The current implementation uses `String.format` to concatenate SQL:
```java
String.format(
"SELECT COLUMN_NAME FROM information_schema.columns " +
"where TABLE_SCHEMA = '%s' AND TABLE_NAME = '%s' AND ...",
schema, table)
```
Although the parameters come from internal calls (`tablePath` in
`getTable`), making the risk theoretically controllable, using parameterized
queries is a safer practice and aligns with the patterns of `listTables()` and
`tableExists()` (both methods use `?` placeholders).
**Potential Risks**:
- Risk 1: If there are other call paths in the future that directly call
`getPrimaryKey()`, it may introduce injection risks
- Risk 2: Single quote characters in schema or table names may cause SQL
syntax errors (although `tablePath` should be validated)
**Impact Scope**:
- Direct impact: `getPrimaryKey()` method
- Indirect impact: All scenarios requiring primary key information (Schema
Evolution, CDC synchronization)
- Affected area: Single Connector (StarRocks)
**Severity**: MINOR
**Improvement Suggestions**:
```java
protected Optional<PrimaryKey> getPrimaryKey(String schema, String table)
throws SQLException {
List<String> pkFields = new ArrayList<>();
try (Statement stmt = conn.createStatement();
ResultSet rs =
stmt.executeQuery(
String.format(
"SELECT COLUMN_NAME FROM
information_schema.columns where TABLE_SCHEMA = '%s' AND TABLE_NAME = '%s' AND
COLUMN_KEY = 'PRI' ORDER BY ORDINAL_POSITION",
schema, table))) {
while (rs.next()) {
String columnName = rs.getString("COLUMN_NAME");
pkFields.add(columnName);
}
}
if (!pkFields.isEmpty()) {
String pkName = "pk_" + String.join("_", pkFields);
return Optional.of(PrimaryKey.of(pkName, pkFields));
}
return Optional.empty();
}
```
Change to:
```java
protected Optional<PrimaryKey> getPrimaryKey(String schema, String table)
throws SQLException {
List<String> pkFields = new ArrayList<>();
String sql = "SELECT COLUMN_NAME FROM information_schema.columns "
+ "WHERE TABLE_SCHEMA = ? AND TABLE_NAME = ? AND COLUMN_KEY =
'PRI' "
+ "ORDER BY ORDINAL_POSITION";
try (PreparedStatement ps = conn.prepareStatement(sql)) {
ps.setString(1, schema);
ps.setString(2, table);
try (ResultSet rs = ps.executeQuery()) {
while (rs.next()) {
String columnName = rs.getString("COLUMN_NAME");
pkFields.add(columnName);
}
}
}
if (!pkFields.isEmpty()) {
String pkName = "pk_" + String.join("_", pkFields);
return Optional.of(PrimaryKey.of(pkName, pkFields));
}
return Optional.empty();
}
```
**Rationale**:
1. **Consistency**: Aligns with methods such as `listTables()` and
`tableExists()` (both use PreparedStatement)
2. **Security**: Prevents potential future injection risks
3. **Performance**: PreparedStatement may be cached by the database,
offering better performance for repeated execution (though this scenario is
infrequent)
4. **Code Style**: Follows JDBC best practices
---
### Issue 3: Lack of Connection lifecycle verification
**Location**: `StarRocksCatalog.java:82` (conn member variable)
**Related Context**:
- Member variable: `private Connection conn;`
- Initialization: `open()` method (lines 430-438)
- Cleanup: `close()` method (lines 444-450)
**Problem Description**:
In the current implementation, `conn` is a member variable, initialized
during `open()` and released during `close()`. However, in the various methods
that use Statement, there is no check whether `conn` is null or already closed.
If the caller continues to call methods such as `getTable()` after
`close()`, it may throw `NullPointerException` or `SQLException: Connection
closed`.
**Potential Risks**:
- Risk 1: Using after Catalog is closed leads to undefined behavior
- Risk 2: If Catalog is used by multiple threads, there may be race
conditions (open/close concurrent with other methods)
**Impact Scope**:
- Direct impact: All methods using `conn`
- Indirect impact: Framework layer if Catalog lifecycle is incorrectly
managed
- Affected area: Single Connector (StarRocks), but other Catalog
implementations may also have this issue
**Severity**: MINOR
**Improvement Suggestions**:
Option 1 (Defensive Programming):
```java
@Override
public CatalogTable getTable(TablePath tablePath)
throws CatalogException, TableNotExistException {
if (conn == null || conn.isClosed()) {
throw new CatalogException("Catalog is not open or has been closed");
}
// Existing logic
}
```
Option 2 (Document Contract):
In JavaDoc, specify:
```java
/**
* Gets the table metadata.
*
* @param tablePath the table path
* @return the catalog table
* @throws CatalogException if catalog is not open
* ...
*/
```
**Rationale**:
- The current SeaTunnel framework already correctly manages Catalog
lifecycle (open first, use, finally close)
- Adding runtime checks incurs performance overhead (checking on every call)
- Recommend documenting the contract explicitly rather than adding defensive
code
---
### Issue 4: Inconsistent error messages
**Location**: `StarRocksCatalog.java:227, 241, 281, 296`
**Related Context**:
- `dropTable()` line 227: `"Failed listing database in catalog %s"` (Error:
should be "Failed dropping table")
- `truncateTable()` line 241: `"Failed TRUNCATE TABLE in catalog %s"`
(Correct)
- `createDatabase()` line 281: `"Failed listing database in catalog %s"`
(Error: should be "Failed creating database")
- `dropDatabase()` line 296: `"Failed listing database in catalog %s"`
(Error: should be "Failed dropping database")
**Problem Description**:
Multiple methods use the same error message template `"Failed listing
database in catalog %s"`, even though these methods perform
DROP/TRUNCATE/CREATE operations rather than LISTING. This leads to debugging
difficulties because error messages do not match the actual operations.
**Potential Risks**:
- Risk 1: Misleading information will appear in logs and error reports
- Risk 2: Wastes time during troubleshooting (thinking list operation failed
when it was actually a drop operation)
**Impact Scope**:
- Direct impact: All exception logs
- Indirect impact: Operations debugging, issue troubleshooting
- Affected area: Single Connector (StarRocks)
**Severity**: MINOR
**Improvement Suggestions**:
Correct error messages:
```java
// dropTable() - line 227
throw new CatalogException(
String.format("Failed dropping table %s in catalog %s",
tablePath.getFullName(), catalogName), e);
// createDatabase() - line 281
throw new CatalogException(
String.format("Failed creating database %s in catalog %s",
tablePath.getDatabaseName(), catalogName), e);
// dropDatabase() - line 296
throw new CatalogException(
String.format("Failed dropping database %s in catalog %s",
tablePath.getDatabaseName(), catalogName), e);
```
**Rationale**:
- This is an error introduced when copying code (possibly copied from
`listDatabases`)
- Correcting error messages does not affect business logic but significantly
improves debuggability
- Performance overhead: Negligible
---
### Issue 5: Conditional logic in truncateTable() may be problematic
**Location**: `StarRocksCatalog.java:231-244`
**Problem Description**:
```java
public void truncateTable(TablePath tablePath, boolean ignoreIfNotExists)
throws TableNotExistException, CatalogException {
try {
if (ignoreIfNotExists) {
try (Statement stmt = conn.createStatement()) {
stmt.execute(StarRocksSaveModeUtil.INSTANCE.getTruncateTableSql(tablePath));
}
}
} catch (Exception e) {
throw new CatalogException(...);
}
}
```
If `ignoreIfNotExists` is `false`, the method does nothing (no exception
thrown, no SQL executed). This appears to be incorrect logic:
- If `ignoreIfNotExists = false`, TRUNCATE should be executed, and an
exception should be thrown even if the table does not exist
- If `ignoreIfNotExists = true`, errors should be ignored when the table
does not exist
**Potential Risks**:
- Risk 1: When `ignoreIfNotExists = false`, the caller expects the table to
be truncated, but nothing actually happens
- Risk 2: Data consistency issues (table not truncated, but job considers it
successful)
**Impact Scope**:
- Direct impact: Jobs using TRUNCATE TABLE Save Mode
- Indirect impact: Data consistency of data synchronization tasks
- Affected area: Single Connector (StarRocks)
**Severity**: MAJOR
**Improvement Suggestions**:
Check the implementation of `StarRocksSaveModeUtil.getTruncateTableSql()` to
see if it already contains `IF EXISTS` logic:
- If the SQL contains `TRUNCATE TABLE IF EXISTS`, the `if
(ignoreIfNotExists)` condition is not needed
- If the SQL does not contain it, the logic needs to be corrected
Expected logic:
```java
public void truncateTable(TablePath tablePath, boolean ignoreIfNotExists)
throws TableNotExistException, CatalogException {
try {
try (Statement stmt = conn.createStatement()) {
stmt.execute(StarRocksSaveModeUtil.INSTANCE.getTruncateTableSql(
tablePath, ignoreIfNotExists)); // Pass parameters
}
} catch (Exception e) {
throw new CatalogException(
String.format("Failed truncating table %s in catalog %s",
tablePath.getFullName(), catalogName), e);
}
}
```
**Rationale**:
- This is a potential logic error that may affect data consistency
- Need to verify the implementation of `StarRocksSaveModeUtil` to determine
the correct fix approach
- If this is indeed a bug, it should be tracked as a separate issue (not
within the scope of this PR)
---
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]