sure. Approved and landed! On Tue, 9 Aug 2022 at 18:55, 田昕峣 (Xinyao Tian) <xinyaot...@yeah.net> wrote:
> Hi Sivabalan, > > > > > Thanks for you kind words! We have been working very hard to prepare > materials for the RFC this week since we got your feedback about our idea, > and I promise it will be very soon (within a few days) that everyone can > read our RFC and realize every details about this feature. It’s our > pleasure to make Hudi even more powerful by making this feature available > to everyone. > > > > > However, there’s one thing that we really need your help. According to the > RFC Process shown in Hudi Docs, we have to first raise a PR and add an > entry to rfc/README.md. But since this is the first time we raise a PR to > Hudi, it’s necessary to have a maintainer with write permission to approve > our PR. We have been wait for days but the PR is still in a pending status. > > > > > Therefore, may I ask you to help us to approve our first PR so that we > could submit our further materials to Hudi? The url of our pending PR is: > https://github.com/apache/hudi/pull/6328 and the corresponding Jira is: > https://issues.apache.org/jira/browse/HUDI-4569 > > > > > Appreciate you so much for your help :) > > > > > Kind regards, > > Xinyao Tian > > > > > > > > On 08/9/2022 21:46,Sivabalan<n.siv...@gmail.com> wrote: > Eagerly looking forward for the RFC Xinyao. Definitely see a lot of folks > benefitting from this. > > On Sun, 7 Aug 2022 at 20:00, 田昕峣 (Xinyao Tian) <xinyaot...@yeah.net> > wrote: > > Hi Shiyan, > > > Thanks so much for your feedback as well as your kind encouragement! It’s > always our honor to contribute our effort to everyone and make Hudi much > awesome :) > > > We are now carefully preparing materials for the new RFC. Once we > finished, we would strictly follow the RFC process shown in the Hudi > official documentation to propose the new RFC and share all details of the > new feature as well as related code to everyone. Since we benefit from Hudi > community, we would like to give back our effort to the community and make > Hudi benefit more people! > > > As always, please stay healthy and keep safe. > > > Kind regards, > Xinyao Tian > On 08/6/2022 10:11,Shiyan Xu<xu.shiyan.raym...@gmail.com> wrote: > Hi Xinyao, awesome achievement! And really appreciate your keenness in > contributing to Hudi. Certainly we'd love to see an RFC for this. > > On Fri, Aug 5, 2022 at 4:21 AM 田昕峣 (Xinyao Tian) <xinyaot...@yeah.net> > wrote: > > Greetings everyone, > > > My name is Xinyao and I'm currently working for an Insurance company. We > found that Apache Hudi is an extremely awesome utility and when it > cooprates with Apache Flink it can be even more powerful. Thus, we have > been using it for months and still keep benefiting from it. > > > However, there is one feature that we really desire but Hudi doesn't > currently have: It is called "Multiple event_time fields verification". > Because in the insurance industry, data is often stored distributed in > dozens of tables and conceptually connected by same primary keys. When the > data is being used, we often need to associate several or even dozens of > tables through the Join operation, and stitch all partial columns into an > entire record with dozens or even hundreds of columns for downstream > services to use. > > > Here comes to the problem. If we want to guarantee that every part of the > data being joined is up to date, Hudi must have the ability to filter > multiple event_time timestamps in a table and keep the most recent records. > So, in this scenario, the signle event_time filtering field provided by > Hudi (i.e. option 'write.precombine.field' in Hudi 0.10.0) is a bit > inadequate. Obviously, in order to cope with the use case with complex Join > operations like above, as well as to provide much potential for Hudi to > support more application scenarios and engage into more industries, Hudi > definitely needs to support the multiple event_time timestamps filtering > feature in a single table. > > > A good news is that, after more than two months of development, me and my > colleagues have made some changes in the hudi-flink and hudi-common modules > based on the hudi-0.10.0 and basically have achieved this feature. > Currently, my team is using the enhanced source code and working with Kafka > and Flink 1.13.2 to conduct some end-to-end testing on a dataset of more > than 140 million real-world insurance data and verifying the accuracy of > the data. The result is quite good: every part of the extremely-wide > records have been updated to latest status based on our continuous > observations during these weeks. We're very keen to make this new feature > available to everyone. We benefit from the Hudi community, so we really > desire to give back to the community with our efforts. > > > The only problem is that, we are not sure whether we need to create a RFC > to illusrtate our design and implementations in detail. According to "RFC > Process" in Hudi official documentation, we have to confirm that this > feature has not already exsited so that we could create a new RFC to share > concept and code as well as explain them in detail. Thus, we really would > like to create a new RFC that would explain our implementation in detail > with theory and code, as well as make it easier for everyone to understand > and make improvement based on our RFC. > > > Look forward to receiving your feedback whether we should create a new RFC > and make Hudi better and better to benifit everyone. > > > Kind regards, > Xinyao Tian > > > > -- > Best, > Shiyan > > > > -- > Regards, > -Sivabalan > -- Regards, -Sivabalan