Re: [DISCUSS] Decouple Hudi and Spark

vino yang Sat, 03 Aug 2019 19:56:05 -0700

+1 for both Beam and Flink

> First step here is to probably draw out current hierrarchy and figure out
> what the abstraction points are..
> In my opinion, the runtime (spark, flink) should be done at the
> hoodie-client level and just used by hoodie-utilties seamlessly..


+1 for Vinoth's opinion, it should be the first step.

No matter we hope Hudi to integrate with which computing framework.
We need to decouple Hudi client and Spark.

We may need a pure client module named for example
hoodie-client-core(common)

Then we could have: hoodie-client-spark, hoodie-client-flink and
hoodie-client-beam

Suneel Marthi <[email protected]> 于2019年8月4日周日 上午10:45写道：

> +1 for Beam -- agree with Semantic Beeng's analysis.
>
> On Sat, Aug 3, 2019 at 10:30 PM taher koitawala <[email protected]>
> wrote:
>
> > So the way to go around this is that file a hip. Chalk all th classes our
> > and start moving towards Pure client.
> >
> > Secondly should we want to try beam?
> >
> > I think there is to much going on here and I'm not able to follow. If we
> > want to try out beam all along I don't think it makes sense to do
> anything
> > on Flink then.
> >
> > On Sun, Aug 4, 2019, 2:30 AM Semantic Beeng <[email protected]>
> > wrote:
> >
> >> +1 My money is on this approach.
> >>
> >> The existing abstractions from Beam seem enough for the use cases as I
> >> imagine them.
> >>
> >> Flink also has "dynamic table", "table source" and "table sink" which
> >> seem very useful abstractions where Hudi might fit nicely.
> >>
> >>
> >>
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/streaming/dynamic_tables.html
> >>
> >>
> >> Attached a screen shot.
> >>
> >> This seems to fit with the original premise of Hudi as well.
> >>
> >> Am exploring this venue with a use case that involves "temporal joins on
> >> streams" which I need for feature extraction.
> >>
> >> Anyone is interested in this or has concrete enough needs and use cases
> >> please let me know.
> >>
> >> Best to go from an agreed upon set of 2-3 use cases.
> >>
> >> Cheers
> >>
> >> Nick
> >>
> >>
> >> > Also, we do have some Beam experts on the mailing list.. Can you
> please
> >> weigh on viability of using Beam as the intermediate abstraction here
> >> between Spark/Flink?
> >> Hudi uses RDD apis like groupBy, mapToPair, sortAndRepartition,
> >> reduceByKey, countByKey and also does custom partitioning a lot.>
> >>
> >> >
> >>
> >
>

Re: [DISCUSS] Decouple Hudi and Spark

Reply via email to