DanielCarter-stack commented on PR #10432:
URL: https://github.com/apache/seatunnel/pull/10432#issuecomment-3832881422
<!-- code-pr-reviewer -->
<!-- cpr:pr_reply_v2_parts {"group": "apache/seatunnel#10432", "part": 1,
"total": 1} -->
## Issue 1: SocketSinkWriter does not implement SupportMultiTableSinkWriter
interface
**Location**:
`seatunnel-connectors-v2/connector-socket/src/main/java/org/apache/seatunnel/connectors/seatunnel/socket/sink/SocketSinkWriter.java:28`
**Current code**:
```java
public class SocketSinkWriter extends AbstractSinkWriter<SeaTunnelRow, Void>
{
// ...
}
```
**Related context**:
- Framework checkpoint:
`seatunnel-api/src/main/java/org/apache/seatunnel/api/sink/multitablesink/MultiTableSinkWriter.java:113-133`
- Type cast point: `MultiTableSinkWriter.java:117` and `:127-128`
- Reference implementation 1:
`seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/sink/AbstractJdbcSinkWriter.java:45-48`
- Reference implementation 2:
`seatunnel-connectors-v2/connector-elasticsearch/src/main/java/org/apache/seatunnel/connectors/seatunnel/elasticsearch/sink/ElasticsearchSinkWriter.java:65-68`
**Problem description**:
When multiple tables route to Socket Sink, the framework creates a
`MultiTableSinkWriter` wrapper. In its `initResourceManager` method (lines 117
and 127-128), the framework **forcibly** casts all SinkWriters to
`SupportMultiTableSinkWriter` type. However, `SocketSinkWriter` only inherits
from `AbstractSinkWriter` and does not implement the
`SupportMultiTableSinkWriter` interface, which will cause a
`ClassCastException` to be thrown at runtime.
**Potential risks**:
- **Risk 1**: Multi-table CDC tasks will crash during the initialization
phase and fail to start
- **Risk 2**: Users will be misled into thinking that Socket connector
supports multiple tables, but it actually cannot be used
- **Risk 3**: Error messages are obscure (ClassCastException), making it
difficult to troubleshoot problems
**Scope of impact**:
- **Direct impact**: All users attempting to use Socket Sink in multi-table
scenarios
- **Indirect impact**: Upper-layer applications and CI/CD pipelines built
based on this functionality
- **Impact surface**: Socket Connector (single Connector)
**Severity**: **BLOCKER**
**Improvement suggestions**:
Option A: Make SocketSinkWriter implement SupportMultiTableSinkWriter
```java
// Modify SocketSinkWriter.java
public class SocketSinkWriter extends AbstractSinkWriter<SeaTunnelRow, Void>
implements SupportMultiTableSinkWriter<Void> {
private final SocketClient socketClient;
SocketSinkWriter(SocketConfig socketConfig, SeaTunnelRowType
seaTunnelRowType)
throws IOException {
this.socketClient =
new SocketClient(socketConfig, new
JsonSerializationSchema(seaTunnelRowType));
socketClient.open();
}
@Override
public void write(SeaTunnelRow element) throws IOException {
socketClient.write(element);
}
@Override
public void close() throws IOException {
socketClient.close();
}
// SupportMultiTableSinkWriter interface methods can use default
implementation
// If custom resource management is needed, you can override the
following methods:
// @Override
// public MultiTableResourceManager<Void>
initMultiTableResourceManager(int tableSize, int queueSize) {
// return null; // Use default implementation
// }
}
```
Option B: If Socket connector is not suitable for supporting multiple tables
(e.g., each Socket connection can only handle one schema), then:
1. Revert this PR
2. Document in the documentation that Socket connector does not support
multi-table scenarios
**Rationale**:
- Option A follows the mature patterns of other connectors (Jdbc,
Elasticsearch, Mongodb)
- `SupportMultiTableSinkWriter` inherits from `SupportResourceShare`, and
its methods have default implementations, which will not break existing code
- This issue must be fixed to safely use multi-table functionality
---
## Issue 2: Missing integration tests for multi-table scenarios
**Location**: Entire `seatunnel-connectors-v2/connector-socket` module
**Current status**:
- Existing tests: `SocketFactoryTest.java` (only tests factory configuration
rules)
- Missing tests: Integration tests for multi-table scenarios
**Related context**:
- Test file location:
`seatunnel-connectors-v2/connector-socket/src/test/java/org/apache/seatunnel/connectors/seatunnel/socket/SocketFactoryTest.java`
- Test scenarios mentioned in PR: `MULTI_TABLE_IMPLEMENTATION.md:62-116`
(but not actually implemented)
**Problem description**:
The PR adds core functionality code, but does not have corresponding
integration tests to verify:
1. Whether SocketSinkWriter can be created normally in multi-table scenarios
2. Whether data from different tables can be correctly written to the same
Socket
3. Whether data serialization of different schemas is correct
4. Whether data shuffling is truly avoided
**Potential risks**:
- **Risk 1**: Problems in the code (such as the ClassCastException in Issue
1) cannot be discovered before merging
- **Risk 2**: Future refactoring may inadvertently break multi-table
functionality
- **Risk 3**: Users lack reference test cases when encountering problems
**Scope of impact**:
- **Direct impact**: Code quality assurance
- **Indirect impact**: User confidence and project reliability
- **Impact surface**: Socket Connector
**Severity**: **MAJOR**
**Improvement suggestions**:
Add integration tests (example):
```java
// New file:
seatunnel-connectors-v2/connector-socket/src/test/java/org/apache/seatunnel/connectors/seatunnel/socket/MultiTableSocketSinkIT.java
package org.apache.seatunnel.connectors.seatunnel.socket;
import org.apache.seatunnel.api.common.JobContext;
import org.apache.seatunnel.api.sink.SinkWriter;
import org.apache.seatunnel.api.table.catalog.CatalogTable;
import org.apache.seatunnel.api.table.catalog.TablePath;
import org.apache.seatunnel.api.table.type.SeaTunnelRow;
import org.apache.seatunnel.api.table.type.SeaTunnelRowType;
import org.apache.seatunnel.connectors.seatunnel.socket.sink.SocketSink;
import
org.apache.seatunnel.connectors.seatunnel.socket.sink.SocketSinkWriter;
import org.junit.jupiter.api.Test;
import java.util.HashMap;
import java.util.Map;
import static org.junit.jupiter.api.Assertions.*;
class MultiTableSocketSinkIT {
@Test
void testMultiTableSinkCreation() throws Exception {
// Test the creation of SocketSink in multi-table scenarios
// Need to simulate a multi-table environment here to verify that no
ClassCastException is thrown
}
@Test
void testDifferentSchemas() throws Exception {
// Test whether data with different schemas can be handled correctly
}
}
```
Or, following the strategy mentioned in the PR documentation, add tests to
the `seatunnel-e2e` module.
**Rationale**:
- Testing is key to ensuring code quality
- Test scenarios mentioned in the documentation should be actually
implemented
- Avoid passing problems on to users
---
## Issue 3: Documentation content does not match actual implementation
**Location**:
`seatunnel-connectors-v2/connector-socket/MULTI_TABLE_IMPLEMENTATION.md:49-53`
**Problematic documentation content**:
```markdown
## Backward Compatibility
✅ **100% Backward Compatible**
- Existing single-table jobs continue to work unchanged
- No breaking changes to public APIs
- No changes to configuration options
```
**Related context**:
- Documentation location: `MULTI_TABLE_IMPLEMENTATION.md:49-53`
- Documentation location: `MULTI_TABLE_IMPLEMENTATION.md:118-123` (test
checklist)
**Problem description**:
The documentation claims "100% backward compatible" and multi-table
functionality is implemented, but in reality:
1. The current implementation will crash in multi-table scenarios (see Issue
1)
2. The test scenarios mentioned in the documentation (Test 2: Multi-Table
CDC, Test 3: Multiple Tables with Different Schemas) cannot pass
3. The documentation does not explain that the Writer needs to implement the
`SupportMultiTableSinkWriter` interface
**Potential risks**:
- **Risk 1**: Users will try to use multi-table functionality after reading
the documentation, then encounter runtime errors
- **Risk 2**: Waste users' time troubleshooting problems
- **Risk 3**: Reduce project credibility
**Scope of impact**:
- **Direct impact**: User documentation reading experience
- **Indirect impact**: User trust and support
- **Impact surface**: Socket Connector documentation
**Severity**: **MAJOR**
**Improvement suggestions**:
**Option A**: Fix the code first, then keep the documentation unchanged
**Option B**: If the code is not fixed temporarily, update the documentation:
```markdown
## Backward Compatibility
⚠️ **Current Status**
- Single-table jobs: ✅ Fully compatible
- Multi-table jobs: ❌ Not yet supported
- **Note**: The marker interface `SupportMultiTableSink` has been added, but
the Writer
implementation requires additional work to support multi-table scenarios.
Please
use single-table configurations for now.
```
And add a "Known Issues" section to the documentation.
**Rationale**:
- Documentation should accurately reflect the actual status
- Avoid misleading users
- If it is a phased implementation, the current progress should be clearly
stated
---
## Issue 4: Missing JavaDoc documentation
**Location**:
`seatunnel-connectors-v2/connector-socket/src/main/java/org/apache/seatunnel/connectors/seatunnel/socket/sink/SocketSink.java:33-34`
**Current code**:
```java
public class SocketSink extends AbstractSimpleSink<SeaTunnelRow, Void>
implements SupportMultiTableSink {
```
**Related context**:
- SocketSink class definition: `SocketSink.java:33-59`
- Interface definition:
`seatunnel-api/src/main/java/org/apache/seatunnel/api/sink/SupportMultiTableSink.java:20-21`
**Problem description**:
New interface implementations have been added, but the class's JavaDoc has
not been updated to explain:
1. When multi-table support started
2. Limitations of multi-table support (if any)
3. Prerequisites for using multi-table functionality
**Potential risks**:
- **Risk 1**: Other developers are unclear about this class's multi-table
support capability
- **Risk 2**: Generated API documentation lacks this information
- **Risk 3**: Maintenance is difficult, and multi-table functionality may be
inadvertently broken in future versions
**Scope of impact**:
- **Direct impact**: Code readability and maintainability
- **Indirect impact**: API documentation completeness
- **Impact surface**: Socket Connector
**Severity**: **MINOR**
**Improvement suggestions**:
```java
/**
* Socket Sink for writing data to a network socket.
*
* <p>This sink supports both single-table and multi-table scenarios. When
used in multi-table
* mode, multiple source tables can write to the same socket without data
shuffling.
*
* <p>Multi-table support is available since SeaTunnel 2.x.x
*
* @see org.apache.seatunnel.api.sink.SupportMultiTableSink
* @since 2.x.x
*/
public class SocketSink extends AbstractSimpleSink<SeaTunnelRow, Void>
implements SupportMultiTableSink {
// ...
}
```
**Rationale**:
- Clear documentation helps other developers understand the functionality
- `@since` tags can track the version when the feature was introduced
- Explaining use cases and limitations can prevent misuse
---
### IV. Overall Assessment
#### Can this be merged
**Conclusion: ❌ Cannot merge**
**Rationale**:
1. **BLOCKER level issue**: The current implementation will throw
`ClassCastException` in multi-table scenarios, and the functionality is
completely unusable
2. **Missing tests**: There are no integration tests to verify multi-table
scenarios
3. **Misleading documentation**: The documentation claims the functionality
is completed and 100% compatible, but it is actually unusable
#### Improvement suggestions
**Short-term solution** (if merge is urgently needed):
1. **At least fix Issue 1**: Make `SocketSinkWriter` implement the
`SupportMultiTableSinkWriter<Void>` interface
2. **Fix Issue 3**: Update documentation to explain the current
implementation status and limitations
3. **Add basic tests**: Verify that multi-table scenarios do not throw
ClassCastException
**Recommended solution** (complete implementation):
1. Fix Issue 1: Implement `SupportMultiTableSinkWriter` interface
2. Fix Issue 2: Add complete integration tests
3. Fix Issue 3: Ensure documentation is accurate
4. Fix Issue 4: Add JavaDoc
5. Run end-to-end tests of multi-table scenarios locally to ensure
functionality works normally
6. Update PR description to explain the actual completed scope
**Alternative solution** (if Socket connector is not suitable for multiple
tables):
1. Revert this PR
2. Document in the documentation that Socket connector does not support
multi-table scenarios
3. If support is needed in the future, redesign and implement completely
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]