sure. Approved and landed!

On Tue, 9 Aug 2022 at 18:55, 田昕峣 (Xinyao Tian) <xinyaot...@yeah.net> wrote:

> Hi Sivabalan,
>
>
>
>
> Thanks for you kind words! We have been working very hard to prepare
> materials for the RFC this week since we got your feedback about our idea,
> and I promise it will be very soon (within a few days) that everyone can
> read our RFC and realize every details about this feature. It’s our
> pleasure to make Hudi even more powerful by making this feature available
> to everyone.
>
>
>
>
> However, there’s one thing that we really need your help. According to the
> RFC Process shown in Hudi Docs, we have to first raise a PR and add an
> entry to rfc/README.md. But since this is the first time we raise a PR to
> Hudi, it’s necessary to have a maintainer with write permission to approve
> our PR. We have been wait for days but the PR is still in a pending status.
>
>
>
>
> Therefore, may I ask you to help us to approve our first PR so that we
> could submit our further materials to Hudi? The url of our pending PR is:
> https://github.com/apache/hudi/pull/6328 and the corresponding Jira is:
> https://issues.apache.org/jira/browse/HUDI-4569
>
>
>
>
> Appreciate you so much for your help :)
>
>
>
>
> Kind regards,
>
> Xinyao Tian
>
>
>
>
>
>
>
> On 08/9/2022 21:46,Sivabalan<n.siv...@gmail.com> wrote:
> Eagerly looking forward for the RFC Xinyao. Definitely see a lot of folks
> benefitting from this.
>
> On Sun, 7 Aug 2022 at 20:00, 田昕峣 (Xinyao Tian) <xinyaot...@yeah.net>
> wrote:
>
> Hi Shiyan,
>
>
> Thanks so much for your feedback as well as your kind encouragement! It’s
> always our honor to contribute our effort to everyone and make Hudi much
> awesome :)
>
>
> We are now carefully preparing materials for the new RFC. Once we
> finished, we would strictly follow the RFC process shown in the Hudi
> official documentation to propose the new RFC and share all details of the
> new feature as well as related code to everyone. Since we benefit from Hudi
> community, we would like to give back our effort to the community and make
> Hudi benefit more people!
>
>
> As always, please stay healthy and keep safe.
>
>
> Kind regards,
> Xinyao Tian
> On 08/6/2022 10:11,Shiyan Xu<xu.shiyan.raym...@gmail.com> wrote:
> Hi Xinyao, awesome achievement! And really appreciate your keenness in
> contributing to Hudi. Certainly we'd love to see an RFC for this.
>
> On Fri, Aug 5, 2022 at 4:21 AM 田昕峣 (Xinyao Tian) <xinyaot...@yeah.net>
> wrote:
>
> Greetings everyone,
>
>
> My name is Xinyao and I'm currently working for an Insurance company. We
> found that Apache Hudi is an extremely awesome utility and when it
> cooprates with Apache Flink it can be even more powerful. Thus, we have
> been using it for months and still keep benefiting from it.
>
>
> However, there is one feature that we really desire but Hudi doesn't
> currently have: It is called "Multiple event_time fields verification".
> Because in the insurance industry, data is often stored distributed in
> dozens of tables and conceptually connected by same primary keys. When the
> data is being used, we often need to associate several or even dozens of
> tables through the Join operation, and stitch all partial columns into an
> entire record with dozens or even hundreds of columns for downstream
> services to use.
>
>
> Here comes to the problem. If we want to guarantee that every part of the
> data being joined is up to date, Hudi must have the ability to filter
> multiple event_time timestamps in a table and keep the most recent records.
> So, in this scenario, the signle event_time filtering field provided by
> Hudi (i.e. option 'write.precombine.field' in Hudi 0.10.0) is a bit
> inadequate. Obviously, in order to cope with the use case with complex Join
> operations like above, as well as to provide much potential for Hudi to
> support more application scenarios and engage into more industries, Hudi
> definitely needs to support the multiple event_time timestamps filtering
> feature in a single table.
>
>
> A good news is that, after more than two months of development, me and my
> colleagues have made some changes in the hudi-flink and hudi-common modules
> based on the hudi-0.10.0 and basically have achieved this feature.
> Currently, my team is using the enhanced source code and working with Kafka
> and Flink 1.13.2 to conduct some end-to-end testing on a dataset of more
> than 140 million real-world insurance data and verifying the accuracy of
> the data. The result is quite good: every part of the extremely-wide
> records have been updated to latest status based on our continuous
> observations during these weeks. We're very keen to make this new feature
> available to everyone. We benefit from the Hudi community, so we really
> desire to give back to the community with our efforts.
>
>
> The only problem is that, we are not sure whether we need to create a RFC
> to illusrtate our design and implementations in detail. According to "RFC
> Process" in Hudi official documentation, we have to confirm that this
> feature has not already exsited so that we could create a new RFC to share
> concept and code as well as explain them in detail. Thus, we really would
> like to create a new RFC that would explain our implementation in detail
> with theory and code, as well as make it easier for everyone to understand
> and make improvement based on our RFC.
>
>
> Look forward to receiving your feedback whether we should create a new RFC
> and make Hudi better and better to benifit everyone.
>
>
> Kind regards,
> Xinyao Tian
>
>
>
> --
> Best,
> Shiyan
>
>
>
> --
> Regards,
> -Sivabalan
>


-- 
Regards,
-Sivabalan

Reply via email to