Eagerly looking forward for the RFC Xinyao. Definitely see a lot of folks benefitting from this.
On Sun, 7 Aug 2022 at 20:00, 田昕峣 (Xinyao Tian) <xinyaot...@yeah.net> wrote: > Hi Shiyan, > > > Thanks so much for your feedback as well as your kind encouragement! It’s > always our honor to contribute our effort to everyone and make Hudi much > awesome :) > > > We are now carefully preparing materials for the new RFC. Once we > finished, we would strictly follow the RFC process shown in the Hudi > official documentation to propose the new RFC and share all details of the > new feature as well as related code to everyone. Since we benefit from Hudi > community, we would like to give back our effort to the community and make > Hudi benefit more people! > > > As always, please stay healthy and keep safe. > > > Kind regards, > Xinyao Tian > On 08/6/2022 10:11,Shiyan Xu<xu.shiyan.raym...@gmail.com> wrote: > Hi Xinyao, awesome achievement! And really appreciate your keenness in > contributing to Hudi. Certainly we'd love to see an RFC for this. > > On Fri, Aug 5, 2022 at 4:21 AM 田昕峣 (Xinyao Tian) <xinyaot...@yeah.net> > wrote: > > Greetings everyone, > > > My name is Xinyao and I'm currently working for an Insurance company. We > found that Apache Hudi is an extremely awesome utility and when it > cooprates with Apache Flink it can be even more powerful. Thus, we have > been using it for months and still keep benefiting from it. > > > However, there is one feature that we really desire but Hudi doesn't > currently have: It is called "Multiple event_time fields verification". > Because in the insurance industry, data is often stored distributed in > dozens of tables and conceptually connected by same primary keys. When the > data is being used, we often need to associate several or even dozens of > tables through the Join operation, and stitch all partial columns into an > entire record with dozens or even hundreds of columns for downstream > services to use. > > > Here comes to the problem. If we want to guarantee that every part of the > data being joined is up to date, Hudi must have the ability to filter > multiple event_time timestamps in a table and keep the most recent records. > So, in this scenario, the signle event_time filtering field provided by > Hudi (i.e. option 'write.precombine.field' in Hudi 0.10.0) is a bit > inadequate. Obviously, in order to cope with the use case with complex Join > operations like above, as well as to provide much potential for Hudi to > support more application scenarios and engage into more industries, Hudi > definitely needs to support the multiple event_time timestamps filtering > feature in a single table. > > > A good news is that, after more than two months of development, me and my > colleagues have made some changes in the hudi-flink and hudi-common modules > based on the hudi-0.10.0 and basically have achieved this feature. > Currently, my team is using the enhanced source code and working with Kafka > and Flink 1.13.2 to conduct some end-to-end testing on a dataset of more > than 140 million real-world insurance data and verifying the accuracy of > the data. The result is quite good: every part of the extremely-wide > records have been updated to latest status based on our continuous > observations during these weeks. We're very keen to make this new feature > available to everyone. We benefit from the Hudi community, so we really > desire to give back to the community with our efforts. > > > The only problem is that, we are not sure whether we need to create a RFC > to illusrtate our design and implementations in detail. According to "RFC > Process" in Hudi official documentation, we have to confirm that this > feature has not already exsited so that we could create a new RFC to share > concept and code as well as explain them in detail. Thus, we really would > like to create a new RFC that would explain our implementation in detail > with theory and code, as well as make it easier for everyone to understand > and make improvement based on our RFC. > > > Look forward to receiving your feedback whether we should create a new RFC > and make Hudi better and better to benifit everyone. > > > Kind regards, > Xinyao Tian > > > > -- > Best, > Shiyan > -- Regards, -Sivabalan