There is no need to create the batch in flink, flink streaming files with the new Streamingfilesink rolls out data on checkpointing. So whenever flink checkpoints we get a new data file written which can be considered as out batch. Since Flink provides exactly one semantic from source to the sink for each record. Flink would be good to have on Hudi
On Sun, 17 Mar, 2019, 1:04 PM Vinoth Chandar, <[email protected]> wrote: > Hi Taher, > > Thanks for kicking off this thread. We can use this itself to discuss > Flink. Hudi uses Spark today on the writing side and the micro-batch model > actually fits very well. Given cloud stores don't support appends anyway, > we would end up micro-batching nonetheless even with Flink. Abstracting out > Spark would be a large effort (gets me to think, if we should then just > rewrite on top of Beam ;)) and I have not thought of any unique advantages > we get for Hudi by adding Flink. Do you have something in mind? > > If you can expand on where the gaps are with only having Spark/Hudi, that'd > be really educative.. > > > Thanks > Vinoth > > On Sat, Mar 16, 2019 at 11:17 PM Taher Koitawala < > [email protected]> > wrote: > > > Hi Prasanna, > > Thank you for your reply. Should we start a discussion or open a > jira > > on this regard then? > > > > On Sun, 17 Mar, 2019, 11:36 AM Prasanna, <[email protected]> wrote: > > > > > Hello, > > > > > > I dont know of any effort to write hudi with flink. > > > > > > - Prasanna > > > > > > On Sat, Mar 16, 2019 at 10:44 PM Taher Koitawala < > > > [email protected]> > > > wrote: > > > > > > > Hey Guys, Any inputs on this? > > > > > > > > On Sat, 16 Mar, 2019, 12:35 PM Taher Koitawala, < > > > [email protected] > > > > > > > > > wrote: > > > > > > > > > Hi Guys, I have recently been exploring about Hudi. It manages to > > > solve a > > > > > lot of our current use cases however my question is Can I use flink > > > with > > > > > Hudi? So far I have only seen spark integration with Hudi. > > > > > > > > > > Flink being more of a real-time processing engine rather than near > > real > > > > > time and with its rich functions like Checkpointing for fault > > > tolerance, > > > > > States for instream computations, better windowing capabilities and > > > very > > > > > high stream throughput, and the exactly once semantics from source > to > > > > sink. > > > > > Flink is capable of being a part of Hudi to solve our instream use > > > cases. > > > > > > > > > > > > > > > Regards, > > > > > Taher Koitawala > > > > > GS Lab Pune > > > > > +91 8407979163 > > > > > > > > > > > > > > >
