DanielCarter-stack commented on PR #10548:
URL: https://github.com/apache/seatunnel/pull/10548#issuecomment-3976718490
<!-- code-pr-reviewer -->
<!-- cpr:pr_reply_v2_parts {"group": "apache/seatunnel#10548", "part": 1,
"total": 1} -->
### Issue 1: Version coexistence risk caused by connector-typesense
hardcoding Jackson version
**Location**: `seatunnel-connectors-v2/connector-typesense/pom.xml:36-39`
```xml
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
<version>2.14.1</version> <!-- 硬编码,未使用 ${jackson.version} -->
</dependency>
```
**Related Context**:
- Main POM: `pom.xml:91` defines `${jackson.version}2.18.6`
- Shade module: `seatunnel-shade/seatunnel-jackson/pom.xml:80-84` shades
Jackson to `org.apache.seatunnel.shade.com.fasterxml.jackson.*`
- Consumers: Multiple Transform/Connector use shaded version
**Problem Description**:
`connector-typesense` directly depends on
`com.fasterxml.jackson.core:jackson-databind:2.14.1`, while other parts of the
project use Jackson 2.18.6 (shaded package path) through the
`seatunnel-jackson` shade module. This leads to **two different versions of
Jackson existing in the classpath**:
1. `org.apache.seatunnel.shade.com.fasterxml.jackson.*` (2.18.6)
2. `com.fasterxml.jackson.*` (2.14.1)
Although the Shade mechanism can avoid package conflicts, if user code or
third-party libraries depend on `com.fasterxml.jackson.*`, the following may
occur:
- Type conversion exceptions (`ClassCastException`)
- Inconsistent behavior (different versions of Jackson may produce different
results when processing the same JSON)
**Potential Risks**:
1. **Class loading conflicts**: If `typesense-java` library internally uses
Jackson and passes it to SeaTunnel code, type incompatibility may occur
2. **Unresolved security vulnerabilities**: Jackson 2.14.1 used by
connector-typesense still has known CVEs
3. **Inconsistent behavior**: Within the same job, different Connectors
using different Jackson versions may produce inconsistent serialization results
**Impact Scope**:
- **Direct impact**: `connector-typesense` and its consumers
- **Indirect impact**: Any code that interacts with `connector-typesense`
(such as data transformation, Format conversion)
- **Affected surface**: Single Connector
**Severity**: MAJOR
**Improvement Suggestion**:
```xml
<!-- seatunnel-connectors-v2/connector-typesense/pom.xml -->
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
<version>${jackson.version}</version> <!-- 改为使用统一版本变量 -->
</dependency>
```
**Rationale**: Uniformly using `${jackson.version}` ensures consistent
Jackson version across the entire project, avoiding class conflicts and
security risks.
---
### Issue 2: Missing test coverage for Jackson 2.18.6 new constraints
**Location**: Global (missing tests)
**Related Context**:
- Jackson 2.17.0 introduced `StreamReadConstraints`, default limits:
- `maxDocumentLength`: 50MB
- `maxNumberLength`: 1000 digits
- `maxDepth`: 1000 layers
-
`seatunnel-formats/seatunnel-format-json/src/main/java/org/apache/seatunnel/format/json/JsonDeserializationSchema.java:67-68`
creates ObjectMapper but does not configure constraints
**Problem Description**:
As a data integration tool, SeaTunnel may need to process **JSON documents
larger than 50MB** or **very large numbers** (such as high-precision values in
financial data). The default constraints of Jackson 2.18.6 may reject these
legitimate data, causing job failures.
**Potential Risks**:
1. **Large document processing failures**: Users encounter
`JsonParseException: Document length exceeded` when synchronizing large JSON
files (such as CDC log batch files)
2. **High-precision number processing failures**: High-precision numbers in
financial and scientific computing scenarios are rejected
3. **Missing error messages**: Current code does not catch new constraint
exceptions, error messages are not user-friendly
**Impact Scope**:
- **Direct impact**: All Connector/Transform using JSON format
- **Indirect impact**: User production environment jobs may fail suddenly
- **Affected surface**: Multiple Connectors + global Format modules
**Severity**: MAJOR
**Improvement Suggestion**:
1. Add test cases to verify default constraints:
```java
// seatunnel-formats/seatunnel-format-json/src/test/java/...
@Test
public void testLargeJsonDocumentWithDefaultConstraints() {
// Create a JSON document > 50MB to verify it is rejected
// Test whether the current default behavior meets expectations
}
```
2. Document how to customize constraints:
```java
// JsonDeserializationSchema.java (example)
public JsonDeserializationSchema(...) {
this.objectMapper = new ObjectMapper();
// If you need to handle very large documents, you can relax the
constraint:
// StreamReadConstraints constraints = StreamReadConstraints.builder()
// .maxDocumentLength(Unlimited) // or a specific value
// .maxNumberLength(2000)
// .build();
// objectMapper.getFactory().setStreamReadConstraints(constraints);
objectMapper.configure(FAIL_ON_UNKNOWN_PROPERTIES, false);
// ...
}
```
3. Document the default constraint limits of Jackson 2.18.6 in documentation
or configuration
**Rationale**: Proactively verify the constraint behavior of the new version
to avoid unexpected failures in production environment.
---
### Issue 3: Missing Jackson version information logging, hindering problem
troubleshooting
**Location**:
`seatunnel-common/src/main/java/org/apache/seatunnel/common/utils/JsonUtils.java:54-66`
```java
public class JsonUtils {
private static final ObjectMapper OBJECT_MAPPER =
new ObjectMapper()
.configure(FAIL_ON_UNKNOWN_PROPERTIES, false)
// ... other configuration
.registerModule(new JavaTimeModule());
// ... missing log output version information
}
```
**Related Context**:
- `JsonUtils` is a global utility class used by extensive code
- Jackson version affects JSON parsing behavior
- When troubleshooting production environment failures, need to confirm the
specific Jackson version being used
**Problem Description**:
Current code does not log Jackson version information. When compatibility
issues arise after upgrade, operations personnel cannot quickly confirm which
Jackson version is being used at runtime (especially the shaded version).
**Potential Risks**:
1. **Difficult troubleshooting**: When users report JSON parsing exceptions,
cannot quickly confirm if caused by version upgrade
2. **Uncertainty**: Shaded class names do not include version information,
difficult to distinguish at runtime
**Impact Scope**:
- **Direct impact**: Global
- **Indirect impact**: All features depending on JSON processing
- **Affected surface**: Global
**Severity**: MINOR
**Improvement Suggestion**:
```java
import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.databind.cfg.PackageVersion;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class JsonUtils {
private static final Logger LOG =
LoggerFactory.getLogger(JsonUtils.class);
static {
// Output Jackson version information
LOG.info("Initializing Jackson ObjectMapper with version: {}",
PackageVersion.VERSION);
}
private static final ObjectMapper OBJECT_MAPPER = new ObjectMapper()
// ... existing configuration
;
}
```
**Rationale**: Adding version logging facilitates quick troubleshooting of
version-related issues at very low cost.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]