Re: Flink With Hudi

Taher Koitawala Tue, 19 Mar 2019 11:18:58 -0700

Hi @Semantic Beeng,  I am keen about getting flink into hudi as I
personally see a lot of good things we can do with it. I am currently
preparing a small google doc with a Sample use case which will help us
understand why we think flink should be included.


          As per efforts I'm willing to work on this and we can go as per
your say as I have fairly good knowledge of Flink.


Regards,
Taher Koitawala

On Tue, 19 Mar, 2019, 11:27 PM Vinoth Chandar, <[email protected]> wrote:

> yeah. This has come up with in our company as well ...
>
>
> On Mon, Mar 18, 2019 at 7:28 PM Pingle Wang <[email protected]> wrote:
>
> > I am very happy to see everyone discussing this topic, there are still
> > many companies using Flink, if Hudi can support Flink, this will attract
> > this part of the user, and further develop our hudi project.
> >
> >
> > thanks.
> > ------------------ 原始邮件 ------------------
> > 发件人: "Vinoth Chandar"<[email protected]>;
> > 发送时间: 2019年3月19日(星期二) 凌晨3:45
> > 收件人: "Semantic Beeng"<[email protected]>;
> > 抄送: "dev"<[email protected]>;
> > 主题: Re: Flink With Hudi
> >
> >
> >
> > Sorry for the late reply. Busy Sunday :)
> >
> > First off, this is a very interesting topic.
> >
> > @Semantic Beeng <[email protected]> , +1 It would be good to lay
> out
> > the current use-cases not met by Spark execution specifically related to
> > Hudi's write path..
> >
> > @taher,  Definitely Flink has its advantages like you mention.
> > Two cases where I thought direct Flink support for writing datasets would
> > be good are :
> >
> > 1) Capture result of Flink jobs and write it out as Hudi dataset.  But
> one
> > can always write a Flink job to compute the results and then store it in
> > Hudi, using it as a Sink?
> > Something like :   Kafka => Flink => Kafka => DeltaStreamer => Hudi on
> dfs
> > 2) If someone is not using Spark at all, then Hudi brings it in and
> > potentially increases ops costs?
> >
> > On Beam, while its admittedly new, if we were to abstract away Spark and
> > Flink from Hudi code (once again non-trivial amount of work ;)), then
> Beam
> > is very attractive.
> > We will end up inventing a Beam-- otherwise anyway :)
> >
> > Thanks
> > Vinoth
> >
> > On Sun, Mar 17, 2019 at 10:22 PM Semantic Beeng <[email protected]>
> > wrote:
> >
> > > Hello Taher,
> > >
> > > Vinoth, was looking for such an assessment from you - thanks. :-)
> > >
> > > Taher - at the high level it sounds interesting to explore some Flink
> > > specific or common use cases, I think.
> > >
> > > We are have discussions about integrating Hudi with Beam so if you can
> > > relate your use cases to Flink + Beam it would be interesting.
> > >
> > > I can imagine useful scenarios where Spark based analytics would be
> > > combined with Flink based analytics going through parquet.
> > >
> > > But do not know Flink enough to see where Hudi like functionality would
> > > fit.
> > >
> > > Could you provide such use cases (all kinds above and others) ? Ideally
> > > with code references.
> > >
> > > Depending on how serious your interest is we can go deeper in wiki.
> > >
> > > Thanks
> > >
> > > Nick
> > >
> > >
> > >
> > >
> > > On March 17, 2019 at 3:34 AM Vinoth Chandar < [email protected]>
> wrote:
> > >
> > >
> > > Hi Taher,
> > >
> > > Thanks for kicking off this thread. We can use this itself to discuss
> > > Flink. Hudi uses Spark today on the writing side and the micro-batch
> > model
> > > actually fits very well. Given cloud stores don't support appends
> anyway,
> > > we would end up micro-batching nonetheless even with Flink. Abstracting
> > > out
> > > Spark would be a large effort (gets me to think, if we should then just
> > > rewrite on top of Beam ;)) and I have not thought of any unique
> > advantages
> > > we get for Hudi by adding Flink. Do you have something in mind?
> > >
> > > If you can expand on where the gaps are with only having Spark/Hudi,
> > > that'd
> > > be really educative..
> > >
> > >
> > > Thanks
> > > Vinoth
> > >
> > > On Sat, Mar 16, 2019 at 11:17 PM Taher Koitawala <
> > > [email protected]>
> > > wrote:
> > >
> > > Hi Prasanna,
> > > Thank you for your reply. Should we start a discussion or open a jira
> > > on this regard then?
> > >
> > > On Sun, 17 Mar, 2019, 11:36 AM Prasanna, < [email protected]>
> > wrote:
> > >
> > > Hello,
> > >
> > > I dont know of any effort to write hudi with flink.
> > >
> > >    - Prasanna
> > >
> > > On Sat, Mar 16, 2019 at 10:44 PM Taher Koitawala <
> > > [email protected]>
> > > wrote:
> > >
> > > Hey Guys, Any inputs on this?
> > >
> > > On Sat, 16 Mar, 2019, 12:35 PM Taher Koitawala, <
> > >
> > > [email protected]
> > >
> > > wrote:
> > >
> > > Hi Guys, I have recently been exploring about Hudi. It manages to
> > >
> > > solve a
> > >
> > > lot of our current use cases however my question is Can I use flink
> > >
> > > with
> > >
> > > Hudi? So far I have only seen spark integration with Hudi.
> > >
> > > Flink being more of a real-time processing engine rather than near
> > >
> > > real
> > >
> > > time and with its rich functions like Checkpointing for fault
> > >
> > > tolerance,
> > >
> > > States for instream computations, better windowing capabilities and
> > >
> > > very
> > >
> > > high stream throughput, and the exactly once semantics from source to
> > >
> > > sink.
> > >
> > > Flink is capable of being a part of Hudi to solve our instream use
> > >
> > > cases.
> > >
> > > >
> > >
> > > Regards,
> > > Taher Koitawala
> > > GS Lab Pune
> > > +91 8407979163
> > >
> > >
> > >
> > >
>

--

Re: Flink With Hudi

Reply via email to