Re: Flink With Hudi

Vinoth Chandar Tue, 19 Mar 2019 10:57:59 -0700

yeah. This has come up with in our company as well ...


On Mon, Mar 18, 2019 at 7:28 PM Pingle Wang <[email protected]> wrote:

> I am very happy to see everyone discussing this topic, there are still
> many companies using Flink, if Hudi can support Flink, this will attract
> this part of the user, and further develop our hudi project.
>
>
> thanks.
> ------------------ 原始邮件 ------------------
> 发件人: "Vinoth Chandar"<[email protected]>;
> 发送时间: 2019年3月19日(星期二) 凌晨3:45
> 收件人: "Semantic Beeng"<[email protected]>;
> 抄送: "dev"<[email protected]>;
> 主题: Re: Flink With Hudi
>
>
>
> Sorry for the late reply. Busy Sunday :)
>
> First off, this is a very interesting topic.
>
> @Semantic Beeng <[email protected]> , +1 It would be good to lay out
> the current use-cases not met by Spark execution specifically related to
> Hudi's write path..
>
> @taher,  Definitely Flink has its advantages like you mention.
> Two cases where I thought direct Flink support for writing datasets would
> be good are :
>
> 1) Capture result of Flink jobs and write it out as Hudi dataset.  But one
> can always write a Flink job to compute the results and then store it in
> Hudi, using it as a Sink?
> Something like :   Kafka => Flink => Kafka => DeltaStreamer => Hudi on dfs
> 2) If someone is not using Spark at all, then Hudi brings it in and
> potentially increases ops costs?
>
> On Beam, while its admittedly new, if we were to abstract away Spark and
> Flink from Hudi code (once again non-trivial amount of work ;)), then Beam
> is very attractive.
> We will end up inventing a Beam-- otherwise anyway :)
>
> Thanks
> Vinoth
>
> On Sun, Mar 17, 2019 at 10:22 PM Semantic Beeng <[email protected]>
> wrote:
>
> > Hello Taher,
> >
> > Vinoth, was looking for such an assessment from you - thanks. :-)
> >
> > Taher - at the high level it sounds interesting to explore some Flink
> > specific or common use cases, I think.
> >
> > We are have discussions about integrating Hudi with Beam so if you can
> > relate your use cases to Flink + Beam it would be interesting.
> >
> > I can imagine useful scenarios where Spark based analytics would be
> > combined with Flink based analytics going through parquet.
> >
> > But do not know Flink enough to see where Hudi like functionality would
> > fit.
> >
> > Could you provide such use cases (all kinds above and others) ? Ideally
> > with code references.
> >
> > Depending on how serious your interest is we can go deeper in wiki.
> >
> > Thanks
> >
> > Nick
> >
> >
> >
> >
> > On March 17, 2019 at 3:34 AM Vinoth Chandar < [email protected]> wrote:
> >
> >
> > Hi Taher,
> >
> > Thanks for kicking off this thread. We can use this itself to discuss
> > Flink. Hudi uses Spark today on the writing side and the micro-batch
> model
> > actually fits very well. Given cloud stores don't support appends anyway,
> > we would end up micro-batching nonetheless even with Flink. Abstracting
> > out
> > Spark would be a large effort (gets me to think, if we should then just
> > rewrite on top of Beam ;)) and I have not thought of any unique
> advantages
> > we get for Hudi by adding Flink. Do you have something in mind?
> >
> > If you can expand on where the gaps are with only having Spark/Hudi,
> > that'd
> > be really educative..
> >
> >
> > Thanks
> > Vinoth
> >
> > On Sat, Mar 16, 2019 at 11:17 PM Taher Koitawala <
> > [email protected]>
> > wrote:
> >
> > Hi Prasanna,
> > Thank you for your reply. Should we start a discussion or open a jira
> > on this regard then?
> >
> > On Sun, 17 Mar, 2019, 11:36 AM Prasanna, < [email protected]>
> wrote:
> >
> > Hello,
> >
> > I dont know of any effort to write hudi with flink.
> >
> >    - Prasanna
> >
> > On Sat, Mar 16, 2019 at 10:44 PM Taher Koitawala <
> > [email protected]>
> > wrote:
> >
> > Hey Guys, Any inputs on this?
> >
> > On Sat, 16 Mar, 2019, 12:35 PM Taher Koitawala, <
> >
> > [email protected]
> >
> > wrote:
> >
> > Hi Guys, I have recently been exploring about Hudi. It manages to
> >
> > solve a
> >
> > lot of our current use cases however my question is Can I use flink
> >
> > with
> >
> > Hudi? So far I have only seen spark integration with Hudi.
> >
> > Flink being more of a real-time processing engine rather than near
> >
> > real
> >
> > time and with its rich functions like Checkpointing for fault
> >
> > tolerance,
> >
> > States for instream computations, better windowing capabilities and
> >
> > very
> >
> > high stream throughput, and the exactly once semantics from source to
> >
> > sink.
> >
> > Flink is capable of being a part of Hudi to solve our instream use
> >
> > cases.
> >
> > >
> >
> > Regards,
> > Taher Koitawala
> > GS Lab Pune
> > +91 8407979163
> >
> >
> >
> >

Re: Flink With Hudi

Reply via email to