Thanks vino yang ~

IMO, we should not put much put too much energy for current Flink writer,
it is not production-ready in the long run. There are so many features need
to add/support for the Flink write/read(MOR write, COR read, MOR read, the
new index), we should focus on one version first, make it robust.

I really hope that we can work together to make the writer production-ready
as soon as possible, it is competitive that we have competitors like Apache
Iceberg and Delta lake, so from this perspective, there is no benefit to be
compatible with the current version writer.

My idea is that i propose the new infrastructure first as quickly as
possible(the basic pipeline, the test framework.), and then we can work
together for the new version (MOR write, COR read, MOR read, the new
index), we better not distract from promote the old writer.

What do you think?

vino yang <yanghua1...@gmail.com> 于2021年1月6日周三 下午2:14写道:

> Hi Danny,
>
> As we discussed in the doc, we should agree on if we should be compatible
> with the version less than Flink 1.11/1.12.
>
> We all know that there are some bottlenecks in the current plan. You
> proposed some improvements, yes it is great, but it radically uses the
> newer features provided by Flink. It is a pity that some users of old
> versions of Flink have no way to benefit from these features.
>
> The information I can provide is that some users have already used the
> current Flink write client or its improved version in a production
> environment. For example, SF Technology, and the Flink versions they use
> are 1.8.x and 1.10.x.
>
> Therefore, I personally suggest that there are two options:
>
> 1) The new design takes into account users of lower versions as much as
> possible and maintains a client version;
> 2) The new design is based on the features of the new version and evolves
> separately from the old version(we also have a plan to optimize the current
> implementation), but the public abstraction can be reused. I think it is
> not impossible to maintain multiple versions. Flink used to support 4+
> versions (0.8.2, 0.9, 0.10, 0.11, universal connector) for Kafka Connector,
> but they share the same code base.
>
> Any thoughts and opinions are welcome and appreciated.
>
> Best,
> Vino
>
> vino yang <yanghua1...@gmail.com> 于2021年1月6日周三 下午1:37写道:
>
> > Hi Danny,
> >
> > You should have cwiki edit permission now.
> > Any problems let me know.
> >
> > Best,
> > Vino
> >
> > Danny Chan <danny0...@apache.org> 于2021年1月6日周三 下午12:05写道:
> >
> >> Sorry ~
> >>
> >> Forget to say that my Confluence ID is danny0405.
> >>
> >> It would be nice if any of you can help on this.
> >>
> >> Best,
> >> Danny Chan
> >>
> >> Danny Chan <danny0...@apache.org> 于2021年1月6日周三 下午12:00写道:
> >>
> >> > Hi, can someone give me the CWIKI permission so that i can update the
> >> > design details to that (maybe as a new RFC though ~).
> >> >
> >> > wangxianghu <wxhj...@126.com> 于2021年1月5日周二 下午2:43写道:
> >> >
> >> >> + 1, Thanks Danny!
> >> >> I believe this new feature OperatorConrdinator in flink-1.11 will
> help
> >> >> improve the current implementation
> >> >>
> >> >> Best,
> >> >>
> >> >> XianghuWang
> >> >>
> >> >> At 2021-01-05 14:17:37, "vino yang" <yanghua1...@gmail.com> wrote:
> >> >> >Hi,
> >> >> >
> >> >> >Sharing more details, the OperatorConrdinator is the part of the new
> >> Data
> >> >> >Source API(Beta) involved in the Flink 1.11's release note[1].
> >> >> >
> >> >> >Flink 1.11 was released only about half a year ago. The design of
> >> RFC-13
> >> >> >began at the end of 2019, and most of the implementation was
> completed
> >> >> when
> >> >> >Flink 1.11 was released.
> >> >> >
> >> >> >I believe that the production environment of many large companies
> has
> >> not
> >> >> >been upgraded so quickly (As far as our company is concerned, we
> still
> >> >> have
> >> >> >some jobs running on flink release packages below 1.9).
> >> >> >
> >> >> >So, maybe we need to find a mechanism to benefit both new and old
> >> users.
> >> >> >
> >> >> >[1]:
> >> >> >
> >> >>
> >>
> https://flink.apache.org/news/2020/07/06/release-1.11.0.html#new-data-source-api-beta
> >> >> >
> >> >> >Best,
> >> >> >Vino
> >> >> >
> >> >> >vino yang <yanghua1...@gmail.com> 于2021年1月5日周二 下午12:30写道:
> >> >> >
> >> >> >> Hi,
> >> >> >>
> >> >> >> +1, thank you Danny for introducing this new feature
> >> >> >> (OperatorCoordinator)[1] of Flink in the recently latest version.
> >> >> >> This feature is very helpful for improving the implementation
> >> >> mechanism of
> >> >> >> Flink write-client.
> >> >> >>
> >> >> >> But this feature is only available after Flink 1.11. Before that,
> >> there
> >> >> >> was no good way to realize the mechanism of task upstream and
> >> >> downstream
> >> >> >> coordination through the public API provided by Flink.
> >> >> >> I just have a concern, whether we need to take into account the
> >> users
> >> >> of
> >> >> >> earlier versions (less than Flink 1.11).
> >> >> >>
> >> >> >> [1]: https://issues.apache.org/jira/browse/FLINK-15099
> >> >> >>
> >> >> >> Best,
> >> >> >> Vino
> >> >> >>
> >> >> >> Gary Li <garyli1...@outlook.com> 于2021年1月5日周二 上午10:40写道:
> >> >> >>
> >> >> >>> Hi Danny,
> >> >> >>>
> >> >> >>> Thanks for the proposal. I'd recommend starting a new RFC. RFC-13
> >> was
> >> >> >>> done and including some work about the refactoring so we should
> >> mark
> >> >> it as
> >> >> >>> completed. Looking forward to having further discussion on the
> RFC.
> >> >> >>>
> >> >> >>> Best,
> >> >> >>> Gary Li
> >> >> >>> ________________________________
> >> >> >>> From: Danny Chan <danny0...@apache.org>
> >> >> >>> Sent: Tuesday, January 5, 2021 10:22 AM
> >> >> >>> To: dev@hudi.apache.org <dev@hudi.apache.org>
> >> >> >>> Subject: Re: [DISCUSS] New Flink Writer Proposal
> >> >> >>>
> >> >> >>> Sure, i can update the RFC-13 cwiki if you agree with that.
> >> >> >>>
> >> >> >>> Vinoth Chandar <vin...@apache.org> 于2021年1月5日周二 上午2:58写道:
> >> >> >>>
> >> >> >>> > Overall +1 on the idea.
> >> >> >>> >
> >> >> >>> > Danny, could we move this to the apache cwiki if you don't
> mind?
> >> >> >>> > That's what we have been using for other RFC discussions.
> >> >> >>> >
> >> >> >>> > On Mon, Jan 4, 2021 at 1:22 AM Danny Chan <
> danny0...@apache.org>
> >> >> wrote:
> >> >> >>> >
> >> >> >>> > > The RFC-13 Flink writer has some bottlenecks that make it
> hard
> >> to
> >> >> >>> adapter
> >> >> >>> > > to production:
> >> >> >>> > >
> >> >> >>> > > - The InstantGeneratorOperator is parallelism 1, which is a
> >> limit
> >> >> for
> >> >> >>> > > high-throughput consumption; because all the split inputs
> drain
> >> >> to a
> >> >> >>> > single
> >> >> >>> > > thread, the network IO would gains pressure too
> >> >> >>> > > - The WriteProcessOperator handles inputs by partition, that
> >> >> means,
> >> >> >>> > within
> >> >> >>> > > each partition write process, the BUCKETs are written one by
> >> one,
> >> >> the
> >> >> >>> > FILE
> >> >> >>> > > IO is limit to adapter to high-throughput inputs
> >> >> >>> > > - It buffers the data by checkpoints, which is too hard to be
> >> >> robust
> >> >> >>> for
> >> >> >>> > > production, the checkpoint function is blocking and should
> not
> >> >> have IO
> >> >> >>> > > operations.
> >> >> >>> > > - The FlinkHoodieIndex is only valid for a per-job scope, it
> >> does
> >> >> not
> >> >> >>> > work
> >> >> >>> > > for existing bootstrap data or for different Flink jobs
> >> >> >>> > >
> >> >> >>> > > Thus, here I propose a new design for the Flink writer to
> solve
> >> >> these
> >> >> >>> > > problems[1]. Overall, the new design tries to remove the
> single
> >> >> >>> > parallelism
> >> >> >>> > > operators and make the index more powerful and scalable.
> >> >> >>> > >
> >> >> >>> > > I plan to solve these bottlenecks incrementally (4 steps),
> >> there
> >> >> are
> >> >> >>> > > already some local POCs for these proposals.
> >> >> >>> > >
> >> >> >>> > > I'm looking forward to your feedback. Any suggestions are
> >> >> appreciated
> >> >> >>> ~
> >> >> >>> > >
> >> >> >>> > > [1]
> >> >> >>> > >
> >> >> >>> > >
> >> >> >>> >
> >> >> >>>
> >> >>
> >>
> https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.google.com%2Fdocument%2Fd%2F1oOcU0VNwtEtZfTRt3v9z4xNQWY-Hy5beu7a1t5B-75I%2Fedit%3Fusp%3Dsharing&amp;data=04%7C01%7C%7Cd256cf75a4f14db4c7f608d8b120d69c%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637454101880191121%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Ecw3TcwsVPFFG74scaE7KhMsIryhVRn9g40B0yMQvfc%3D&amp;reserved=0
> >> >> >>> > >
> >> >> >>> >
> >> >> >>>
> >> >> >>
> >> >>
> >> >
> >>
> >
>

Reply via email to