Re: Flink With Hudi

Taher Koitawala Sun, 17 Mar 2019 00:54:29 -0700

There is no need to create the batch in flink, flink streaming files with
the new Streamingfilesink rolls out data on checkpointing. So whenever
flink checkpoints we get a new data file written which can be considered as
out batch. Since Flink provides exactly one semantic from source to the
sink for each record. Flink would be good to have on Hudi


On Sun, 17 Mar, 2019, 1:04 PM Vinoth Chandar, <[email protected]> wrote:

> Hi Taher,
>
> Thanks for kicking off this thread. We can use this itself to discuss
> Flink. Hudi uses Spark today on the writing side and the micro-batch model
> actually fits very well. Given cloud stores don't support appends anyway,
> we would end up micro-batching nonetheless even with Flink. Abstracting out
> Spark would be a large effort (gets me to think, if we should then just
> rewrite on top of Beam ;)) and I have not thought of any unique advantages
> we get for Hudi by adding Flink. Do you have something in mind?
>
> If you can expand on where the gaps are with only having Spark/Hudi, that'd
> be really educative..
>
>
> Thanks
> Vinoth
>
> On Sat, Mar 16, 2019 at 11:17 PM Taher Koitawala <
> [email protected]>
> wrote:
>
> > Hi Prasanna,
> >       Thank you for your reply. Should we start a discussion or open a
> jira
> > on this regard then?
> >
> > On Sun, 17 Mar, 2019, 11:36 AM Prasanna, <[email protected]> wrote:
> >
> > > Hello,
> > >
> > > I dont know of any effort to write hudi with flink.
> > >
> > > - Prasanna
> > >
> > > On Sat, Mar 16, 2019 at 10:44 PM Taher Koitawala <
> > > [email protected]>
> > > wrote:
> > >
> > > > Hey Guys, Any inputs on this?
> > > >
> > > > On Sat, 16 Mar, 2019, 12:35 PM Taher Koitawala, <
> > > [email protected]
> > > > >
> > > > wrote:
> > > >
> > > > > Hi Guys, I have recently been exploring about Hudi. It manages to
> > > solve a
> > > > > lot of our current use cases however my question is Can I use flink
> > > with
> > > > > Hudi? So far I have only seen spark integration with Hudi.
> > > > >
> > > > > Flink being more of a real-time processing engine rather than near
> > real
> > > > > time and with its rich functions like Checkpointing for fault
> > > tolerance,
> > > > > States for instream computations, better windowing capabilities and
> > > very
> > > > > high stream throughput, and the exactly once semantics from source
> to
> > > > sink.
> > > > > Flink is capable of being a part of Hudi to solve our instream use
> > > cases.
> > > > >
> > > > >
> > > > > Regards,
> > > > > Taher Koitawala
> > > > > GS Lab Pune
> > > > > +91 8407979163
> > > > >
> > > >
> > >
> >
>

Re: Flink With Hudi

Reply via email to