Yixue (Andrew) Zhu created HUDI-898:
---------------------------------------

             Summary: Need to add Schema parameter to 
HoodieRecordPayload::preCombine
                 Key: HUDI-898
                 URL: https://issues.apache.org/jira/browse/HUDI-898
             Project: Apache Hudi (incubating)
          Issue Type: Improvement
          Components: Common Core
            Reporter: Yixue (Andrew) Zhu


We are working on Mongo Oplog integration with Hudi, to stream Mongo updates to 
Hudi tables.

There are 4 Mongo OpLog operations we need to handle, CRUD (create, read, 
update, delete).

Currently Hudi handle create/read, delete, but not update well with existing 
preCombine API in HoodieRecordPayload class. In particularly, Update operation 
contains "patch" field, which is extended Json describing update for dot 
separated field paths.

We need to pass Avro schema to preCombine API for it to work:

Even though BaseAvroPayload constructor accepts GenericRecord, which has Avro 
schema reference, but it materialize GenericRecord to bytes, to support 
serialization/deserialization by ExternalSpillableMap.

 

Is there concern/objection to this? in other words, have I overlooked something?

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to