Hey, hey, Fengjian!

With the landing of the RFC-46 we'll be kick-starting a process of phasing
out HoodieRecordPayload as an abstraction and instead migrating to
HoodieRecordMerger interface.
I'd recommend to base your design considerations off the new
HoodieRecordMerger interface instead of legacy HoodieRecordPayload to make
sure it's future-proof.

On Thu, Oct 20, 2022 at 10:08 AM 冯健 <fengjian...@gmail.com> wrote:

> Hi guys,
>     After reading this article with respect to how to implement SCD-2 with
> Hudi Build Slowly Changing Dimensions Type 2 (SCD2) with Apache Spark and
> Apache Hudi on Amazon EMR
> <
> https://aws.amazon.com/blogs/big-data/build-slowly-changing-dimensions-type-2-scd2-with-apache-spark-and-apache-hudi-on-amazon-emr/
> >
>     I have an idea about implementing embedded SCD-2 support in hudi by
> using a new Payload. Users don't need to manually join the data, then
> update end_data and status.
>    For example, the record key is 'id,end_date',  Let's say the current
> data's id is 1 and the end_date is 2099-12-31,  when a new record with id=1
> arrives, it will update the current record's end_date to 2022-10-21, and
> also insert this new record with end_data ' 2099-12-31'.  so this Payload
> will generate two records in combineAndGetUpdateValue . there will be no
> join cost, and the whole process is transparent to users.
>
>    Any thoughts?
>

Reply via email to