DanielCarter-stack commented on PR #10361:
URL: https://github.com/apache/seatunnel/pull/10361#issuecomment-3847584214
<!-- code-pr-reviewer -->
<!-- cpr:pr_reply_v2_parts {"group": "apache/seatunnel#10361", "part": 1,
"total": 1} -->
### Issue 1: Inconsistent exception handling in buildAesKey
**Location**:
`seatunnel-transforms-v2/src/main/java/org/apache/seatunnel/transform/encrypt/encryptor/AesCbcEncryptor.java:110`
```java
if (!(keyBytes.length == 16 || keyBytes.length == 24 || keyBytes.length ==
32)) {
throw new IllegalArgumentException("Invalid AES key length: " +
keyBytes.length);
}
```
**Related context**:
- Parent class/interface: `Encryptor.java:20-25`
- Caller: `AesCbcEncryptor.init()` → `FieldEncryptTransform.transformRow()`
**Problem description**:
Throws raw `IllegalArgumentException` instead of using SeaTunnel's unified
error handling mechanism, leading to inconsistent error message format and
inability to be properly caught and wrapped by upper layers.
**Potential risks**:
- Risk 1: Error messages do not comply with SeaTunnel error code
specifications, affecting error tracking and log analysis
- Risk 2: May cause some error handling logic to fail (if dependent on
`TransformException`)
**Scope of impact**:
- Direct impact: `AesCbcEncryptor.buildAesKey()`
- Indirect impact: `FieldEncryptTransform` initialization phase
- Affected area: Single Transform (only FieldEncrypt)
**Severity**: MAJOR
**Improvement suggestion**:
```java
// Suggested change to
if (!(keyBytes.length == 16 || keyBytes.length == 24 || keyBytes.length ==
32)) {
throw CommonError.illegalArgument(
key,
"Invalid AES key length: " + keyBytes.length + ". Expected 16, 24,
or 32 bytes"
);
}
```
**Rationale**: Uniformly use `CommonError` utility method to ensure
consistent error message format, facilitating error tracking and user
understanding.
---
### Issue 2: Inconsistent exception handling in initializeFieldIndexes
**Location**:
`seatunnel-transforms-v2/src/main/java/org/apache/seatunnel/transform/encrypt/FieldEncryptTransform.java:131`
```java
throw new IllegalArgumentException("Field not found: " + fieldName);
```
**Related context**:
- Parent class: `AbstractCatalogSupportMapTransform.java:33-110`
- Similar implementation: `TransformCommonError.cannotFindInputFieldError()`
**Problem description**:
When field validation fails in the constructor, it throws
`IllegalArgumentException`, while other Transforms use `TransformCommonError`
for unified handling, leading to inconsistent error handling.
**Potential risks**:
- Risk 1: Error messages lack the Transform plugin name, making it difficult
to locate the problem source
- Risk 2: Missing error codes, affecting automated error analysis
**Scope of impact**:
- Direct impact: `FieldEncryptTransform` constructor
- Indirect impact: Error reporting when Transform initialization fails
- Affected area: Single Transform
**Severity**: MAJOR
**Improvement suggestion**:
```java
// Suggested change to
throw TransformCommonError.cannotFindInputFieldError(PLUGIN_NAME, fieldName);
```
**Rationale**: Consistent with
`TransformCommonError.cannotFindInputFieldError()`, this utility method
properly formats error messages and includes the plugin name.
---
### Issue 3: Inconsistent encryption/decryption of empty strings
**Location**:
`seatunnel-transforms-v2/src/main/java/org/apache/seatunnel/transform/encrypt/FieldEncryptTransform.java:90-94`
```java
String value = field.toString();
if (StringUtils.isNotBlank(value)) {
inputRow.setField(index, encryptor.encrypt(value));
}
```
**Related context**:
- Encryption path: Line 85-94
- Decryption path: Line 96-106
- Test cases: `FieldEncryptTransformTest.java:95-139`
**Problem description**:
Using `StringUtils.isNotBlank(value)` skips empty strings, but the original
data may be an empty string, causing:
1. Empty strings are skipped during encryption, the field remains `""`
2. If the field is stored and decrypted again, it will be skipped and remain
`""`
3. However, if an empty string is encrypted with another tool first
(producing a non-empty result), decryption will skip it due to `isNotBlank()`,
causing decryption failure
**Potential risks**:
- Risk 1: Empty strings cannot be properly encrypted (though encrypting
empty strings has little practical significance)
- Risk 2: May fail when interoperating with external encryption tools
- Risk 3: Test cases do not cover empty string scenarios, boundary condition
behavior is unclear
**Scope of impact**:
- Direct impact: Encryption/decryption logic
- Indirect impact: Data consistency and interoperability
- Affected area: Single Transform
**Severity**: MINOR
**Improvement suggestion**:
```java
// Option 1: Explicitly skip null but process empty strings
if (field != null) {
String value = field.toString();
// Empty strings will also be encrypted (will get a fixed-length
ciphertext)
inputRow.setField(index, encryptor.encrypt(value));
}
// Option 2: Keep current logic, but document the behavior
// Or add configuration option skip_empty_values
```
**Rationale**: Current behavior is reasonable in most scenarios, but should
be clearly documented, or covered in tests to ensure predictable behavior.
---
### Issue 4: Missing encryption performance optimization
**Location**:
`seatunnel-transforms-v2/src/main/java/org/apache/seatunnel/transform/encrypt/encryptor/AesCbcEncryptor.java:62-63`
```java
Cipher cipher = Cipher.getInstance(ALGORITHM);
cipher.init(Cipher.ENCRYPT_MODE, keySpec, ivSpec);
```
**Related context**:
- Call path: `FieldEncryptTransform.transformRow()` → `encryptor.encrypt()`
→ creating Cipher each time
- Performance impact: Each record's each field creates a new Cipher instance
**Problem description**:
Each call to `encrypt()` or `decrypt()` creates a new `Cipher` instance,
causing frequent object creation and GC pressure during large-batch data
processing.
**Potential risks**:
- Risk 1: Significant performance degradation in throughput-sensitive
scenarios (e.g., million-level TPS)
- Risk 2: Frequent GC may cause latency jitter
- Risk 3: Cannot reuse Cipher in multi-threaded environments (Cipher is not
thread-safe)
**Scope of impact**:
- Direct impact: Encryption/decryption performance
- Indirect impact: Overall job throughput
- Affected area: Single Transform (but performance impact may be significant)
**Severity**: MAJOR (for high-performance scenarios)
**Improvement suggestion**:
```java
// Option 1: Use ThreadLocal to cache Cipher (but need to pay attention to
IV randomness)
private static final ThreadLocal<Cipher> ENCRYPT_CIPHER =
ThreadLocal.withInitial(() -> {
try {
return Cipher.getInstance(ALGORITHM);
} catch (Exception e) {
throw new RuntimeException(e);
}
});
// Option 2: Only document performance impact, recommend users choose based
on scenario
// Option 3: Provide batch encryption API (encrypt multiple fields at once)
```
**Rationale**: Although optimization can improve performance, trade-offs are
needed:
- Cipher is not thread-safe, cannot use singleton
- Each encryption needs a different IV, cannot simply reuse
- ThreadLocal increases memory footprint
Recommend at least documenting the performance impact, or providing a
configuration option for users to choose whether to optimize.
---
### Issue 5: Error messages may leak sensitive information
**Location**:
`seatunnel-transforms-v2/src/main/java/org/apache/seatunnel/transform/encrypt/encryptor/AesCbcEncryptor.java:66`
```java
throw TransformCommonError.encryptionError(plainText, e);
```
**Related context**:
- Error handling: `TransformCommonError.java:95-98`
- Logging: May be recorded to log files
**Problem description**:
When encryption fails, the complete plaintext `plainText` is passed into the
error message. If this error is logged, it may cause sensitive information
leakage.
**Potential risks**:
- Risk 1: Log files may contain sensitive information such as user phone
numbers, ID numbers, etc.
- Risk 2: Log files usually have long retention periods, increasing leakage
risk
- Risk 3: Logs may be accessed by multiple systems, expanding the leakage
surface
**Scope of impact**:
- Direct impact: Error log content
- Indirect impact: Data security and compliance (e.g., GDPR, Personal
Information Protection Law)
- Affected area: Single Transform (but high security risk)
**Severity**: MAJOR (security-related)
**Improvement suggestion**:
```java
// Option 1: Only log the first few characters
throw TransformCommonError.encryptionError(
plainText.length() > 10 ? plainText.substring(0, 10) + "..." :
plainText,
e
);
// Option 2: Log hash value
throw TransformCommonError.encryptionError(
"SHA256:" + DigestUtils.sha256Hex(plainText),
e
);
// Option 3: Don't log plaintext, only log length
throw TransformCommonError.encryptionError(
"plaintext_with_length_" + plainText.length(),
e
);
```
**Rationale**: Security best practices require avoiding logging sensitive
information, even error logs should be desensitized.
---
### Issue 6: Missing input length limit
**Location**:
`seatunnel-transforms-v2/src/main/java/org/apache/seatunnel/transform/encrypt/encryptor/AesCbcEncryptor.java:54-68`
```java
@Override
public String encrypt(String plainText) {
byte[] iv = new byte[IV_SIZE];
SECURE_RANDOM.nextBytes(iv);
// ...
encrypted = cipher.doFinal(plainText.getBytes(StandardCharsets.UTF_8));
// ...
}
```
**Related context**:
- Input source: `SeaTunnelRow` field, may contain ultra-long strings
- Memory impact: `plainText.getBytes()` creates a byte array equal in size
to the input
**Problem description**:
No limit on input string length, malicious or anomalous data may cause
ultra-long string encryption, exhausting memory.
**Potential risks**:
- Risk 1: Malicious users construct ultra-long strings (e.g., 1GB), causing
OOM
- Risk 2: Normal data unexpectedly ultra-large (e.g., BLOB field mistakenly
treated as String)
- Risk 3: Base64 encoding increases volume by ~33%, aggravating memory
pressure
**Scope of impact**:
- Direct impact: Memory usage
- Indirect impact: Job stability (may cause TaskManager OOM)
- Affected area: Single Transform (but may cause entire job to fail)
**Severity**: MAJOR
**Improvement suggestion**:
```java
// Add configuration in FieldEncryptTransformConfig
public static final Option<Integer> MAX_FIELD_LENGTH =
Options.key("max_field_length")
.intType()
.defaultValue(10 * 1024 * 1024) // 10MB
.withDescription("Maximum field length to encrypt");
// Check before encrypt()
@Override
public String encrypt(String plainText) {
if (plainText.length() > maxLength) {
throw CommonError.illegalArgument(
null,
"Field too long to encrypt: " + plainText.length() + " > " +
maxLength
);
}
// ...
}
```
**Rationale**: Input validation is a basic security principle, especially
for cryptographic operations, input size should be limited to prevent DoS
attacks.
---
### Issue 7: Incomplete key format validation
**Location**:
`seatunnel-transforms-v2/src/main/java/org/apache/seatunnel/transform/encrypt/encryptor/AesCbcEncryptor.java:102-114`
```java
private SecretKeySpec buildAesKey(String key) {
String base64 = key;
if (key.startsWith("base64:")) {
base64 = key.substring("base64:".length());
}
byte[] keyBytes = Base64.getDecoder().decode(base64);
// ...
}
```
**Related context**:
- Documentation: Supports two formats `base64:xxx` or direct Base64 string
- Error handling: `Base64.getDecoder().decode()` may throw
`IllegalArgumentException`
**Problem description**:
1. Does not validate whether Base64 decoding succeeded (may throw unhandled
exceptions)
2. Does not validate whether the key contains illegal characters (e.g.,
control characters)
3. Exception message for key length check does not explain expected length
**Potential risks**:
- Risk 1: When user inputs invalid Base64, exception message is unclear
(`Illegal base64 character`)
- Risk 2: Key contains leading/trailing spaces causing decoding failure
- Risk 3: Error message does not explain 16/24/32 bytes corresponding to
AES-128/192/256
**Scope of impact**:
- Direct impact: User experience when configuration is wrong
- Indirect impact: Debugging efficiency
- Affected area: Single Transform
**Severity**: MINOR
**Improvement suggestion**:
```java
private SecretKeySpec buildAesKey(String key) {
if (key == null || key.trim().isEmpty()) {
throw CommonError.illegalArgument(key, "Encryption key cannot be
null or empty");
}
String base64 = key.trim(); // Trim leading and trailing whitespace
if (base64.startsWith("base64:")) {
base64 = base64.substring("base64:".length()).trim();
}
byte[] keyBytes;
try {
keyBytes = Base64.getDecoder().decode(base64);
} catch (IllegalArgumentException e) {
throw CommonError.illegalArgument(
key,
"Invalid Base64 encoding in encryption key"
);
}
if (!(keyBytes.length == 16 || keyBytes.length == 24 || keyBytes.length
== 32)) {
throw CommonError.illegalArgument(
key,
"Invalid AES key length: " + keyBytes.length +
" bytes. Expected 16 (AES-128), 24 (AES-192), or 32 (AES-256)
bytes"
);
}
return new SecretKeySpec(keyBytes, "AES");
}
```
**Rationale**: More complete input validation and error messages can
significantly improve user experience and reduce configuration error debugging
time.
---
### Issue 8: Insufficient test coverage
**Location**:
`seatunnel-transforms-v2/src/test/java/org/apache/seatunnel/transform/encrypt/FieldEncryptTransformTest.java`
**Related context**:
- Existing tests: Only cover basic encryption/decryption scenarios
- Compared to other Transforms: e.g., `FieldRenameTransformTest` includes
more boundary tests
**Problem description**:
Unit tests lack the following key scenarios:
1. Empty string encryption/decryption
2. Field does not exist
3. Non-String type fields
4. Invalid key length
5. Unsupported encryption algorithm
6. Invalid mode parameter
7. Null field handling
8. Ultra-long string encryption
**Potential risks**:
- Risk 1: Behavior under boundary conditions is unclear
- Risk 2: Refactoring may introduce regression bugs
- Risk 3: Users cannot predict system behavior when encountering anomalous
scenarios
**Scope of impact**:
- Direct impact: Code reliability
- Indirect impact: Subsequent maintenance costs
- Affected area: Single Transform
**Severity**: MAJOR (code quality)
**Improvement suggestion**:
```java
@Test
void testNullField() {
// Test null field not encrypted
}
@Test
void testEmptyString() {
// Test empty string encryption/decryption
}
@Test
void testFieldNotFound() {
// Test exception when field doesn't exist
Assertions.assertThrows(TransformException.class, () -> {
new FieldEncryptTransform(config, catalogTable);
});
}
@Test
void testInvalidKeyLength() {
// Test invalid key length
}
@Test
void testUnsupportedAlgorithm() {
// Test unsupported encryption algorithm
}
@Test
void testInvalidMode() {
// Test invalid mode parameter
}
@Test
void testNonStringField() {
// Test non-String type fields
}
```
**Rationale**: Comprehensive unit testing is the foundation of code quality,
especially for features involving security and data integrity.
---
### Issue 9: Case confusion in documentation mode values
**Location**: `docs/en/transforms/encrypt.md:45` and
`docs/en/transforms/encrypt.md:56`
```markdown
mode = "ENCRYPT" # 使用大写
```
**Related context**:
- Code implementation: `FieldEncryptTransform.java:84` uses
`equalsIgnoreCase(mode)`
- Configuration example: Test cases use lowercase `"encrypt"` and `"decrypt"`
**Problem description**:
Documentation uses uppercase `ENCRYPT`/`DECRYPT`, but test cases use
lowercase, and the code implementation is case-insensitive. This may cause user
confusion.
**Potential risks**:
- Risk 1: Users may think uppercase must be used
- Risk 2: Copying documentation configuration may lead to inconsistent
configuration style
**Scope of impact**:
- Direct impact: User experience
- Indirect impact: Configuration consistency
- Affected area: Documentation
**Severity**: MINOR
**Improvement suggestion**:
```markdown
## Example
### Encrypt fields
```
transform {
FieldEncrypt {
fields = ["name"]
key = "base64:AAAAAAAAAAAAAAAAAAAAAA=="
algorithm = "AES_CBC"
mode = "encrypt" # Case-insensitive, can also be "ENCRYPT"
}
}
```
```
** Reason**: Documentation should clearly explain configuration item
characteristics to avoid user confusion.
---
# ## Issue 10: Missing performance impact description
** Location**: `docs/en/transforms/encrypt.md`
** Related context**:
- 加密开销: 每条记录每个字段执行 AES-CBC + Base64 编解码
- 对比: 其他 Transform(如 `FieldRename`)开销极小
** Problem description**:
文档未说明使用该 Transform 对作业性能的影响,用户可能低估性能开销。
** Potential risks**:
- 风险 1: 用户在生产环境启用后才发现性能下降
- 风险 2: 无法做出合理的性能优化决策(如只加密必要字段)
** Scope of impact**:
- 直接影响: 用户决策
- 间接影响: 生产环境性能
- 影响面: 文档
** Severity**: MINOR
** Improvement suggestions**:
```markdown
## Performance Considerations
- Encryption and decryption operations add overhead to data processing
- Each field is encrypted/decrypted individually using AES-CBC
- Performance impact depends on:
- Number of fields to encrypt/decrypt
- Data volume (rows per second)
- Algorithm used (currently only AES-CBC)
- Benchmark: On typical hardware, AES-128 encryption can process ~100MB/s
per core
- Recommendation: Only encrypt fields that truly contain sensitive
information
```
**Rationale**: Transparent performance information helps users make
reasonable architectural decisions.
---
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]