voonhous commented on code in PR #18837:
URL: https://github.com/apache/hudi/pull/18837#discussion_r3340311202
##########
hudi-trino-plugin/src/main/java/io/trino/plugin/hudi/reader/HudiTrinoReaderContext.java:
##########
@@ -130,98 +149,16 @@ public IndexedRecord next()
}
@Override
- public IndexedRecord convertAvroRecord(IndexedRecord record)
+ protected Option<HoodieRecordMerger> getRecordMerger(RecordMergeMode
mergeMode, String mergeStrategyId, String mergeImplClasses)
{
- return record;
+ return Option.of(new HoodiePreCombineAvroRecordMerger());
Review Comment:
Addressed in 915410e (option A). `getRecordMerger` now dispatches on
`RecordMergeMode`, mirroring `HoodieAvroReaderContext`:
- `EVENT_TIME_ORDERING` -> `HoodieAvroRecordMerger` (read-time
`combineAndGetUpdateValue`)
- `COMMIT_TIME_ORDERING` -> `OverwriteWithLatestMerger`
- `CUSTOM` -> `HoodieRecordUtils.createValidRecordMerger(EngineType.JAVA,
mergeImplClasses, mergeStrategyId)`
The previous fixed `HoodiePreCombineAvroRecordMerger` applies write-side
`preCombine` semantics regardless of merge mode (its javadoc notes it is "only
for deduplication among incoming records"), so `COMMIT_TIME_ORDERING` and
custom-payload tables merged incorrectly on MoR reads. Existing MoR tests use
the default payload, where `preCombine` and `combineAndGetUpdateValue`
coincide, so they stay green.
Follow-up apache/hudi#18898 tracks adding MoR delete-marker / custom-payload
test coverage (the paths where the merger choice actually diverges); it is
referenced via an inline `TODO` on the method.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]