Yixue (Andrew) Zhu created HUDI-898:
---------------------------------------
Summary: Need to add Schema parameter to
HoodieRecordPayload::preCombine
Key: HUDI-898
URL: https://issues.apache.org/jira/browse/HUDI-898
Project: Apache Hudi (incubating)
Issue Type: Improvement
Components: Common Core
Reporter: Yixue (Andrew) Zhu
We are working on Mongo Oplog integration with Hudi, to stream Mongo updates to
Hudi tables.
There are 4 Mongo OpLog operations we need to handle, CRUD (create, read,
update, delete).
Currently Hudi handle create/read, delete, but not update well with existing
preCombine API in HoodieRecordPayload class. In particularly, Update operation
contains "patch" field, which is extended Json describing update for dot
separated field paths.
We need to pass Avro schema to preCombine API for it to work:
Even though BaseAvroPayload constructor accepts GenericRecord, which has Avro
schema reference, but it materialize GenericRecord to bytes, to support
serialization/deserialization by ExternalSpillableMap.
Is there concern/objection to this? in other words, have I overlooked something?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)