Re: Flink With Hudi

Vinoth Chandar Tue, 19 Mar 2019 11:25:53 -0700

That's great. Mind starting a HIP around this?

https://cwiki.apache.org/confluence/display/HUDI/Hudi+Improvement+Plan+Details+and+Process



On Tue, Mar 19, 2019 at 11:18 AM Taher Koitawala <[email protected]>
wrote:

> Hi @Semantic Beeng,  I am keen about getting flink into hudi as I
> personally see a lot of good things we can do with it. I am currently
> preparing a small google doc with a Sample use case which will help us
> understand why we think flink should be included.
>
>           As per efforts I'm willing to work on this and we can go as per
> your say as I have fairly good knowledge of Flink.
>
>
> Regards,
> Taher Koitawala
>
> On Tue, 19 Mar, 2019, 11:27 PM Vinoth Chandar, <[email protected]> wrote:
>
> > yeah. This has come up with in our company as well ...
> >
> >
> > On Mon, Mar 18, 2019 at 7:28 PM Pingle Wang <[email protected]> wrote:
> >
> > > I am very happy to see everyone discussing this topic, there are still
> > > many companies using Flink, if Hudi can support Flink, this will
> attract
> > > this part of the user, and further develop our hudi project.
> > >
> > >
> > > thanks.
> > > ------------------ 原始邮件 ------------------
> > > 发件人: "Vinoth Chandar"<[email protected]>;
> > > 发送时间: 2019年3月19日(星期二) 凌晨3:45
> > > 收件人: "Semantic Beeng"<[email protected]>;
> > > 抄送: "dev"<[email protected]>;
> > > 主题: Re: Flink With Hudi
> > >
> > >
> > >
> > > Sorry for the late reply. Busy Sunday :)
> > >
> > > First off, this is a very interesting topic.
> > >
> > > @Semantic Beeng <[email protected]> , +1 It would be good to lay
> > out
> > > the current use-cases not met by Spark execution specifically related
> to
> > > Hudi's write path..
> > >
> > > @taher,  Definitely Flink has its advantages like you mention.
> > > Two cases where I thought direct Flink support for writing datasets
> would
> > > be good are :
> > >
> > > 1) Capture result of Flink jobs and write it out as Hudi dataset.  But
> > one
> > > can always write a Flink job to compute the results and then store it
> in
> > > Hudi, using it as a Sink?
> > > Something like :   Kafka => Flink => Kafka => DeltaStreamer => Hudi on
> > dfs
> > > 2) If someone is not using Spark at all, then Hudi brings it in and
> > > potentially increases ops costs?
> > >
> > > On Beam, while its admittedly new, if we were to abstract away Spark
> and
> > > Flink from Hudi code (once again non-trivial amount of work ;)), then
> > Beam
> > > is very attractive.
> > > We will end up inventing a Beam-- otherwise anyway :)
> > >
> > > Thanks
> > > Vinoth
> > >
> > > On Sun, Mar 17, 2019 at 10:22 PM Semantic Beeng <
> [email protected]>
> > > wrote:
> > >
> > > > Hello Taher,
> > > >
> > > > Vinoth, was looking for such an assessment from you - thanks. :-)
> > > >
> > > > Taher - at the high level it sounds interesting to explore some Flink
> > > > specific or common use cases, I think.
> > > >
> > > > We are have discussions about integrating Hudi with Beam so if you
> can
> > > > relate your use cases to Flink + Beam it would be interesting.
> > > >
> > > > I can imagine useful scenarios where Spark based analytics would be
> > > > combined with Flink based analytics going through parquet.
> > > >
> > > > But do not know Flink enough to see where Hudi like functionality
> would
> > > > fit.
> > > >
> > > > Could you provide such use cases (all kinds above and others) ?
> Ideally
> > > > with code references.
> > > >
> > > > Depending on how serious your interest is we can go deeper in wiki.
> > > >
> > > > Thanks
> > > >
> > > > Nick
> > > >
> > > >
> > > >
> > > >
> > > > On March 17, 2019 at 3:34 AM Vinoth Chandar < [email protected]>
> > wrote:
> > > >
> > > >
> > > > Hi Taher,
> > > >
> > > > Thanks for kicking off this thread. We can use this itself to discuss
> > > > Flink. Hudi uses Spark today on the writing side and the micro-batch
> > > model
> > > > actually fits very well. Given cloud stores don't support appends
> > anyway,
> > > > we would end up micro-batching nonetheless even with Flink.
> Abstracting
> > > > out
> > > > Spark would be a large effort (gets me to think, if we should then
> just
> > > > rewrite on top of Beam ;)) and I have not thought of any unique
> > > advantages
> > > > we get for Hudi by adding Flink. Do you have something in mind?
> > > >
> > > > If you can expand on where the gaps are with only having Spark/Hudi,
> > > > that'd
> > > > be really educative..
> > > >
> > > >
> > > > Thanks
> > > > Vinoth
> > > >
> > > > On Sat, Mar 16, 2019 at 11:17 PM Taher Koitawala <
> > > > [email protected]>
> > > > wrote:
> > > >
> > > > Hi Prasanna,
> > > > Thank you for your reply. Should we start a discussion or open a jira
> > > > on this regard then?
> > > >
> > > > On Sun, 17 Mar, 2019, 11:36 AM Prasanna, < [email protected]>
> > > wrote:
> > > >
> > > > Hello,
> > > >
> > > > I dont know of any effort to write hudi with flink.
> > > >
> > > >    - Prasanna
> > > >
> > > > On Sat, Mar 16, 2019 at 10:44 PM Taher Koitawala <
> > > > [email protected]>
> > > > wrote:
> > > >
> > > > Hey Guys, Any inputs on this?
> > > >
> > > > On Sat, 16 Mar, 2019, 12:35 PM Taher Koitawala, <
> > > >
> > > > [email protected]
> > > >
> > > > wrote:
> > > >
> > > > Hi Guys, I have recently been exploring about Hudi. It manages to
> > > >
> > > > solve a
> > > >
> > > > lot of our current use cases however my question is Can I use flink
> > > >
> > > > with
> > > >
> > > > Hudi? So far I have only seen spark integration with Hudi.
> > > >
> > > > Flink being more of a real-time processing engine rather than near
> > > >
> > > > real
> > > >
> > > > time and with its rich functions like Checkpointing for fault
> > > >
> > > > tolerance,
> > > >
> > > > States for instream computations, better windowing capabilities and
> > > >
> > > > very
> > > >
> > > > high stream throughput, and the exactly once semantics from source to
> > > >
> > > > sink.
> > > >
> > > > Flink is capable of being a part of Hudi to solve our instream use
> > > >
> > > > cases.
> > > >
> > > > >
> > > >
> > > > Regards,
> > > > Taher Koitawala
> > > > GS Lab Pune
> > > > +91 8407979163
> > > >
> > > >
> > > >
> > > >
> >
>
> --
>
>

Re: Flink With Hudi

Reply via email to