We are working on Mongo Oplog integration with Hudi, to stream Mongo
updates to Hudi tables.

There are 4 Mongo OpLog operations we need to handle, CRUD (create,
read, update, delete).

Currently Hudi handle create/read, delete, but not update well with
existing preCombine API in HoodieRecordPayload class. In particularly,
Update operation contains "patch" field, which is extended Json
describing update for dot separated field paths.

We need to pass Avro schema to preCombine API for it to work:

Even though BaseAvroPayload constructor accepts GenericRecord, which
has Avro schema reference, but it materialize GenericRecord to bytes,
to support serialization/deserialization by ExternalSpillableMap.


Is there concern/objection to this? in other words, have I overlooked something?

I have created https://issues.apache.org/jira/browse/HUDI-898 to track it.

Best,
Yixue

-- 
Best Regards,
yixue

Reply via email to