Sure I can. I have just posted a small example in the trail above. Take a look and let me know what you think. If it seems good to you. We may use the same one in the HIP.
On Tue, 19 Mar, 2019, 11:55 PM Vinoth Chandar, <[email protected]> wrote: > That's great. Mind starting a HIP around this? > > > https://cwiki.apache.org/confluence/display/HUDI/Hudi+Improvement+Plan+Details+and+Process > > > On Tue, Mar 19, 2019 at 11:18 AM Taher Koitawala < > [email protected]> > wrote: > > > Hi @Semantic Beeng, I am keen about getting flink into hudi as I > > personally see a lot of good things we can do with it. I am currently > > preparing a small google doc with a Sample use case which will help us > > understand why we think flink should be included. > > > > As per efforts I'm willing to work on this and we can go as per > > your say as I have fairly good knowledge of Flink. > > > > > > Regards, > > Taher Koitawala > > > > On Tue, 19 Mar, 2019, 11:27 PM Vinoth Chandar, <[email protected]> > wrote: > > > > > yeah. This has come up with in our company as well ... > > > > > > > > > On Mon, Mar 18, 2019 at 7:28 PM Pingle Wang <[email protected]> wrote: > > > > > > > I am very happy to see everyone discussing this topic, there are > still > > > > many companies using Flink, if Hudi can support Flink, this will > > attract > > > > this part of the user, and further develop our hudi project. > > > > > > > > > > > > thanks. > > > > ------------------ 原始邮件 ------------------ > > > > 发件人: "Vinoth Chandar"<[email protected]>; > > > > 发送时间: 2019年3月19日(星期二) 凌晨3:45 > > > > 收件人: "Semantic Beeng"<[email protected]>; > > > > 抄送: "dev"<[email protected]>; > > > > 主题: Re: Flink With Hudi > > > > > > > > > > > > > > > > Sorry for the late reply. Busy Sunday :) > > > > > > > > First off, this is a very interesting topic. > > > > > > > > @Semantic Beeng <[email protected]> , +1 It would be good to > lay > > > out > > > > the current use-cases not met by Spark execution specifically related > > to > > > > Hudi's write path.. > > > > > > > > @taher, Definitely Flink has its advantages like you mention. > > > > Two cases where I thought direct Flink support for writing datasets > > would > > > > be good are : > > > > > > > > 1) Capture result of Flink jobs and write it out as Hudi dataset. > But > > > one > > > > can always write a Flink job to compute the results and then store it > > in > > > > Hudi, using it as a Sink? > > > > Something like : Kafka => Flink => Kafka => DeltaStreamer => Hudi > on > > > dfs > > > > 2) If someone is not using Spark at all, then Hudi brings it in and > > > > potentially increases ops costs? > > > > > > > > On Beam, while its admittedly new, if we were to abstract away Spark > > and > > > > Flink from Hudi code (once again non-trivial amount of work ;)), then > > > Beam > > > > is very attractive. > > > > We will end up inventing a Beam-- otherwise anyway :) > > > > > > > > Thanks > > > > Vinoth > > > > > > > > On Sun, Mar 17, 2019 at 10:22 PM Semantic Beeng < > > [email protected]> > > > > wrote: > > > > > > > > > Hello Taher, > > > > > > > > > > Vinoth, was looking for such an assessment from you - thanks. :-) > > > > > > > > > > Taher - at the high level it sounds interesting to explore some > Flink > > > > > specific or common use cases, I think. > > > > > > > > > > We are have discussions about integrating Hudi with Beam so if you > > can > > > > > relate your use cases to Flink + Beam it would be interesting. > > > > > > > > > > I can imagine useful scenarios where Spark based analytics would be > > > > > combined with Flink based analytics going through parquet. > > > > > > > > > > But do not know Flink enough to see where Hudi like functionality > > would > > > > > fit. > > > > > > > > > > Could you provide such use cases (all kinds above and others) ? > > Ideally > > > > > with code references. > > > > > > > > > > Depending on how serious your interest is we can go deeper in wiki. > > > > > > > > > > Thanks > > > > > > > > > > Nick > > > > > > > > > > > > > > > > > > > > > > > > > On March 17, 2019 at 3:34 AM Vinoth Chandar < [email protected]> > > > wrote: > > > > > > > > > > > > > > > Hi Taher, > > > > > > > > > > Thanks for kicking off this thread. We can use this itself to > discuss > > > > > Flink. Hudi uses Spark today on the writing side and the > micro-batch > > > > model > > > > > actually fits very well. Given cloud stores don't support appends > > > anyway, > > > > > we would end up micro-batching nonetheless even with Flink. > > Abstracting > > > > > out > > > > > Spark would be a large effort (gets me to think, if we should then > > just > > > > > rewrite on top of Beam ;)) and I have not thought of any unique > > > > advantages > > > > > we get for Hudi by adding Flink. Do you have something in mind? > > > > > > > > > > If you can expand on where the gaps are with only having > Spark/Hudi, > > > > > that'd > > > > > be really educative.. > > > > > > > > > > > > > > > Thanks > > > > > Vinoth > > > > > > > > > > On Sat, Mar 16, 2019 at 11:17 PM Taher Koitawala < > > > > > [email protected]> > > > > > wrote: > > > > > > > > > > Hi Prasanna, > > > > > Thank you for your reply. Should we start a discussion or open a > jira > > > > > on this regard then? > > > > > > > > > > On Sun, 17 Mar, 2019, 11:36 AM Prasanna, < [email protected]> > > > > wrote: > > > > > > > > > > Hello, > > > > > > > > > > I dont know of any effort to write hudi with flink. > > > > > > > > > > - Prasanna > > > > > > > > > > On Sat, Mar 16, 2019 at 10:44 PM Taher Koitawala < > > > > > [email protected]> > > > > > wrote: > > > > > > > > > > Hey Guys, Any inputs on this? > > > > > > > > > > On Sat, 16 Mar, 2019, 12:35 PM Taher Koitawala, < > > > > > > > > > > [email protected] > > > > > > > > > > wrote: > > > > > > > > > > Hi Guys, I have recently been exploring about Hudi. It manages to > > > > > > > > > > solve a > > > > > > > > > > lot of our current use cases however my question is Can I use flink > > > > > > > > > > with > > > > > > > > > > Hudi? So far I have only seen spark integration with Hudi. > > > > > > > > > > Flink being more of a real-time processing engine rather than near > > > > > > > > > > real > > > > > > > > > > time and with its rich functions like Checkpointing for fault > > > > > > > > > > tolerance, > > > > > > > > > > States for instream computations, better windowing capabilities and > > > > > > > > > > very > > > > > > > > > > high stream throughput, and the exactly once semantics from source > to > > > > > > > > > > sink. > > > > > > > > > > Flink is capable of being a part of Hudi to solve our instream use > > > > > > > > > > cases. > > > > > > > > > > > > > > > > > > > > > Regards, > > > > > Taher Koitawala > > > > > GS Lab Pune > > > > > +91 8407979163 > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > --
