davidzollo commented on PR #10489:
URL: https://github.com/apache/seatunnel/pull/10489#issuecomment-3900314011
Good job.
This enables more complex UDFs that need to maintain state or adapt behavior
based on the input row context.
And I found some issues need to be addressed:
### Finding 1 — Opening all discovered UDFs at init time
- Severity: **High**
- Description:
- Engine calls `open()` for every UDF discovered via SPI, regardless of
whether it is referenced in the SQL.
- References:
- ZetaSQLEngine.init -> openUDFs():
seatunnel-transforms-v2/.../ZetaSQLEngine.java#L91-L99
- openUDFs implementation:
seatunnel-transforms-v2/.../ZetaSQLEngine.java#L108-L125
- Risk:
- Startup latency grows with number/complexity of UDFs
- Job can fail due to unrelated UDF initialization failures
- In cluster mode, this amplifies by parallelism (multiple JVMs)
- Best improvement:
- Only `open()` UDFs that are actually referenced by the parsed SQL AST
(collect function names from the statement).
- Alternative: lazy-open the UDF on first use, with a per-UDF open state
(and thread-safety / idempotency).
### Finding 2 — Open failure drops exception cause
- Severity: **Medium**
- Description:
- On `udf.open()` failure, the thrown exception includes only
`e.getMessage()`.
- Reference:
- seatunnel-transforms-v2/.../ZetaSQLEngine.java#L111-L121
- Risk:
- Production troubleshooting becomes difficult (no stack trace / root
cause)
- Best improvement:
- Preserve the cause when throwing `TransformException` (or log the full
exception with stack trace before throwing).
### Finding 3 — Mutable fields array exposed via context
- Severity: **Low to Medium**
- Description:
- `getAllFields()` returns the original `Object[]`.
- References:
- seatunnel-transforms-v2/.../ZetaUDFContext.java#L44-L50
- seatunnel-transforms-v2/.../ZetaUDFContext.java#L108-L110
- Risk:
- A buggy UDF could mutate the array and affect later expressions /
outputs.
- Best improvement:
- Document `allFields` as read-only, or provide safer APIs
(`getField(int)`), or return a defensive copy (performance tradeoff).
### Finding 4 — Test coverage relies on E2E only, Please add UT for this PR
- Severity: **Medium**
- Description:
- E2E exists, but unit tests are missing for key branching and lifecycle
semantics.
- Reference: seatunnel-e2e/.../ExampleUdfIT.java#L47-L52
- Risk:
- Slower feedback loops; regressions harder to pinpoint.
- Best improvement:
- Add unit tests in `seatunnel-transforms-v2` to validate routing and
lifecycle behavior.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]