raghusrhyme opened a new issue, #18915:
URL: https://github.com/apache/hudi/issues/18915
### Bug Description
**What happened:**
Custom JAR transformers that do column projection (e.g. `ColumnFilter` with
`mode=include`) drop `_corrupt_record` since they are unaware of the
error-table contract. `ErrorTableAwareChainedTransformer` calls `validate()`
after every transformer in the chain, throwing `HoodieValidationException:
Invalid condition, columnName=_corrupt_record is not present in transformer
output schema`.
**What you expected:**
Pipeline should complete successfully — `_corrupt_record` should be
re-injected if a transformer drops it.
**Steps to reproduce:**
1. Enable error table (`hoodie.errortable.enable=true`)
2. Configure a custom transformer that does
`dataset.select(explicitColumns)` (not including `_corrupt_record`)
3. Run HoodieStreamer
### Environment
**Hudi version:** master (0.16.0-SNAPSHOT)
**Query engine:** Spark 3.5
**Relevant configs:** `hoodie.errortable.enable=true`,
`hoodie.errortable.write.class=<any concrete ErrorTableWriter>`
### Logs and Stack Trace
```
org.apache.hudi.exception.HoodieValidationException: Invalid condition,
columnName=_corrupt_record is not present in transformer output schema
at
org.apache.hudi.utilities.streamer.ErrorTableUtils.validate(ErrorTableUtils.java:88)
at
org.apache.hudi.utilities.transform.ErrorTableAwareChainedTransformer.apply(ErrorTableAwareChainedTransformer.java:59)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]