Re: Flink With Hudi

Taher Koitawala Tue, 19 Mar 2019 11:51:45 -0700

Sure I can. I have just posted a small example in the trail above. Take a
look and let me know what you think. If it seems good to you. We may use
the same one in the HIP.


On Tue, 19 Mar, 2019, 11:55 PM Vinoth Chandar, <[email protected]> wrote:

> That's great. Mind starting a HIP around this?
>
>
> https://cwiki.apache.org/confluence/display/HUDI/Hudi+Improvement+Plan+Details+and+Process
>
>
> On Tue, Mar 19, 2019 at 11:18 AM Taher Koitawala <
> [email protected]>
> wrote:
>
> > Hi @Semantic Beeng,  I am keen about getting flink into hudi as I
> > personally see a lot of good things we can do with it. I am currently
> > preparing a small google doc with a Sample use case which will help us
> > understand why we think flink should be included.
> >
> >           As per efforts I'm willing to work on this and we can go as per
> > your say as I have fairly good knowledge of Flink.
> >
> >
> > Regards,
> > Taher Koitawala
> >
> > On Tue, 19 Mar, 2019, 11:27 PM Vinoth Chandar, <[email protected]>
> wrote:
> >
> > > yeah. This has come up with in our company as well ...
> > >
> > >
> > > On Mon, Mar 18, 2019 at 7:28 PM Pingle Wang <[email protected]> wrote:
> > >
> > > > I am very happy to see everyone discussing this topic, there are
> still
> > > > many companies using Flink, if Hudi can support Flink, this will
> > attract
> > > > this part of the user, and further develop our hudi project.
> > > >
> > > >
> > > > thanks.
> > > > ------------------ 原始邮件 ------------------
> > > > 发件人: "Vinoth Chandar"<[email protected]>;
> > > > 发送时间: 2019年3月19日(星期二) 凌晨3:45
> > > > 收件人: "Semantic Beeng"<[email protected]>;
> > > > 抄送: "dev"<[email protected]>;
> > > > 主题: Re: Flink With Hudi
> > > >
> > > >
> > > >
> > > > Sorry for the late reply. Busy Sunday :)
> > > >
> > > > First off, this is a very interesting topic.
> > > >
> > > > @Semantic Beeng <[email protected]> , +1 It would be good to
> lay
> > > out
> > > > the current use-cases not met by Spark execution specifically related
> > to
> > > > Hudi's write path..
> > > >
> > > > @taher,  Definitely Flink has its advantages like you mention.
> > > > Two cases where I thought direct Flink support for writing datasets
> > would
> > > > be good are :
> > > >
> > > > 1) Capture result of Flink jobs and write it out as Hudi dataset.
> But
> > > one
> > > > can always write a Flink job to compute the results and then store it
> > in
> > > > Hudi, using it as a Sink?
> > > > Something like :   Kafka => Flink => Kafka => DeltaStreamer => Hudi
> on
> > > dfs
> > > > 2) If someone is not using Spark at all, then Hudi brings it in and
> > > > potentially increases ops costs?
> > > >
> > > > On Beam, while its admittedly new, if we were to abstract away Spark
> > and
> > > > Flink from Hudi code (once again non-trivial amount of work ;)), then
> > > Beam
> > > > is very attractive.
> > > > We will end up inventing a Beam-- otherwise anyway :)
> > > >
> > > > Thanks
> > > > Vinoth
> > > >
> > > > On Sun, Mar 17, 2019 at 10:22 PM Semantic Beeng <
> > [email protected]>
> > > > wrote:
> > > >
> > > > > Hello Taher,
> > > > >
> > > > > Vinoth, was looking for such an assessment from you - thanks. :-)
> > > > >
> > > > > Taher - at the high level it sounds interesting to explore some
> Flink
> > > > > specific or common use cases, I think.
> > > > >
> > > > > We are have discussions about integrating Hudi with Beam so if you
> > can
> > > > > relate your use cases to Flink + Beam it would be interesting.
> > > > >
> > > > > I can imagine useful scenarios where Spark based analytics would be
> > > > > combined with Flink based analytics going through parquet.
> > > > >
> > > > > But do not know Flink enough to see where Hudi like functionality
> > would
> > > > > fit.
> > > > >
> > > > > Could you provide such use cases (all kinds above and others) ?
> > Ideally
> > > > > with code references.
> > > > >
> > > > > Depending on how serious your interest is we can go deeper in wiki.
> > > > >
> > > > > Thanks
> > > > >
> > > > > Nick
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On March 17, 2019 at 3:34 AM Vinoth Chandar < [email protected]>
> > > wrote:
> > > > >
> > > > >
> > > > > Hi Taher,
> > > > >
> > > > > Thanks for kicking off this thread. We can use this itself to
> discuss
> > > > > Flink. Hudi uses Spark today on the writing side and the
> micro-batch
> > > > model
> > > > > actually fits very well. Given cloud stores don't support appends
> > > anyway,
> > > > > we would end up micro-batching nonetheless even with Flink.
> > Abstracting
> > > > > out
> > > > > Spark would be a large effort (gets me to think, if we should then
> > just
> > > > > rewrite on top of Beam ;)) and I have not thought of any unique
> > > > advantages
> > > > > we get for Hudi by adding Flink. Do you have something in mind?
> > > > >
> > > > > If you can expand on where the gaps are with only having
> Spark/Hudi,
> > > > > that'd
> > > > > be really educative..
> > > > >
> > > > >
> > > > > Thanks
> > > > > Vinoth
> > > > >
> > > > > On Sat, Mar 16, 2019 at 11:17 PM Taher Koitawala <
> > > > > [email protected]>
> > > > > wrote:
> > > > >
> > > > > Hi Prasanna,
> > > > > Thank you for your reply. Should we start a discussion or open a
> jira
> > > > > on this regard then?
> > > > >
> > > > > On Sun, 17 Mar, 2019, 11:36 AM Prasanna, < [email protected]>
> > > > wrote:
> > > > >
> > > > > Hello,
> > > > >
> > > > > I dont know of any effort to write hudi with flink.
> > > > >
> > > > >    - Prasanna
> > > > >
> > > > > On Sat, Mar 16, 2019 at 10:44 PM Taher Koitawala <
> > > > > [email protected]>
> > > > > wrote:
> > > > >
> > > > > Hey Guys, Any inputs on this?
> > > > >
> > > > > On Sat, 16 Mar, 2019, 12:35 PM Taher Koitawala, <
> > > > >
> > > > > [email protected]
> > > > >
> > > > > wrote:
> > > > >
> > > > > Hi Guys, I have recently been exploring about Hudi. It manages to
> > > > >
> > > > > solve a
> > > > >
> > > > > lot of our current use cases however my question is Can I use flink
> > > > >
> > > > > with
> > > > >
> > > > > Hudi? So far I have only seen spark integration with Hudi.
> > > > >
> > > > > Flink being more of a real-time processing engine rather than near
> > > > >
> > > > > real
> > > > >
> > > > > time and with its rich functions like Checkpointing for fault
> > > > >
> > > > > tolerance,
> > > > >
> > > > > States for instream computations, better windowing capabilities and
> > > > >
> > > > > very
> > > > >
> > > > > high stream throughput, and the exactly once semantics from source
> to
> > > > >
> > > > > sink.
> > > > >
> > > > > Flink is capable of being a part of Hudi to solve our instream use
> > > > >
> > > > > cases.
> > > > >
> > > > > >
> > > > >
> > > > > Regards,
> > > > > Taher Koitawala
> > > > > GS Lab Pune
> > > > > +91 8407979163
> > > > >
> > > > >
> > > > >
> > > > >
> > >
> >
> > --
> >
> >
>

--

Re: Flink With Hudi

Reply via email to