yeah. This has come up with in our company as well ...
On Mon, Mar 18, 2019 at 7:28 PM Pingle Wang <[email protected]> wrote: > I am very happy to see everyone discussing this topic, there are still > many companies using Flink, if Hudi can support Flink, this will attract > this part of the user, and further develop our hudi project. > > > thanks. > ------------------ 原始邮件 ------------------ > 发件人: "Vinoth Chandar"<[email protected]>; > 发送时间: 2019年3月19日(星期二) 凌晨3:45 > 收件人: "Semantic Beeng"<[email protected]>; > 抄送: "dev"<[email protected]>; > 主题: Re: Flink With Hudi > > > > Sorry for the late reply. Busy Sunday :) > > First off, this is a very interesting topic. > > @Semantic Beeng <[email protected]> , +1 It would be good to lay out > the current use-cases not met by Spark execution specifically related to > Hudi's write path.. > > @taher, Definitely Flink has its advantages like you mention. > Two cases where I thought direct Flink support for writing datasets would > be good are : > > 1) Capture result of Flink jobs and write it out as Hudi dataset. But one > can always write a Flink job to compute the results and then store it in > Hudi, using it as a Sink? > Something like : Kafka => Flink => Kafka => DeltaStreamer => Hudi on dfs > 2) If someone is not using Spark at all, then Hudi brings it in and > potentially increases ops costs? > > On Beam, while its admittedly new, if we were to abstract away Spark and > Flink from Hudi code (once again non-trivial amount of work ;)), then Beam > is very attractive. > We will end up inventing a Beam-- otherwise anyway :) > > Thanks > Vinoth > > On Sun, Mar 17, 2019 at 10:22 PM Semantic Beeng <[email protected]> > wrote: > > > Hello Taher, > > > > Vinoth, was looking for such an assessment from you - thanks. :-) > > > > Taher - at the high level it sounds interesting to explore some Flink > > specific or common use cases, I think. > > > > We are have discussions about integrating Hudi with Beam so if you can > > relate your use cases to Flink + Beam it would be interesting. > > > > I can imagine useful scenarios where Spark based analytics would be > > combined with Flink based analytics going through parquet. > > > > But do not know Flink enough to see where Hudi like functionality would > > fit. > > > > Could you provide such use cases (all kinds above and others) ? Ideally > > with code references. > > > > Depending on how serious your interest is we can go deeper in wiki. > > > > Thanks > > > > Nick > > > > > > > > > > On March 17, 2019 at 3:34 AM Vinoth Chandar < [email protected]> wrote: > > > > > > Hi Taher, > > > > Thanks for kicking off this thread. We can use this itself to discuss > > Flink. Hudi uses Spark today on the writing side and the micro-batch > model > > actually fits very well. Given cloud stores don't support appends anyway, > > we would end up micro-batching nonetheless even with Flink. Abstracting > > out > > Spark would be a large effort (gets me to think, if we should then just > > rewrite on top of Beam ;)) and I have not thought of any unique > advantages > > we get for Hudi by adding Flink. Do you have something in mind? > > > > If you can expand on where the gaps are with only having Spark/Hudi, > > that'd > > be really educative.. > > > > > > Thanks > > Vinoth > > > > On Sat, Mar 16, 2019 at 11:17 PM Taher Koitawala < > > [email protected]> > > wrote: > > > > Hi Prasanna, > > Thank you for your reply. Should we start a discussion or open a jira > > on this regard then? > > > > On Sun, 17 Mar, 2019, 11:36 AM Prasanna, < [email protected]> > wrote: > > > > Hello, > > > > I dont know of any effort to write hudi with flink. > > > > - Prasanna > > > > On Sat, Mar 16, 2019 at 10:44 PM Taher Koitawala < > > [email protected]> > > wrote: > > > > Hey Guys, Any inputs on this? > > > > On Sat, 16 Mar, 2019, 12:35 PM Taher Koitawala, < > > > > [email protected] > > > > wrote: > > > > Hi Guys, I have recently been exploring about Hudi. It manages to > > > > solve a > > > > lot of our current use cases however my question is Can I use flink > > > > with > > > > Hudi? So far I have only seen spark integration with Hudi. > > > > Flink being more of a real-time processing engine rather than near > > > > real > > > > time and with its rich functions like Checkpointing for fault > > > > tolerance, > > > > States for instream computations, better windowing capabilities and > > > > very > > > > high stream throughput, and the exactly once semantics from source to > > > > sink. > > > > Flink is capable of being a part of Hudi to solve our instream use > > > > cases. > > > > > > > > > Regards, > > Taher Koitawala > > GS Lab Pune > > +91 8407979163 > > > > > > > >
