Re: [DISCUSS] Integrate Hudi with Apache Flink

taher koitawala Thu, 01 Aug 2019 10:01:49 -0700

Agreed. Scope should be 1 and 2. We will take on 1 while you take on 2

On Thu, Aug 1, 2019, 9:54 PM Vinay Patil <[email protected]> wrote:


> Hi Vinoth,
>
> Thank you for proposing this plan, let's keep the scope to 1&2 , as part of
> v1 let's start with point 1 and you guys can tackle point 2 in parallel.
>
> Excited to be a part of this development.
>
> Regards,
> Vinay Patil
>
>
> On Thu, 1 Aug 2019, 21:49 Vinoth Chandar, <[email protected]> wrote:
>
> > Here are my thoughts..
> >
> > Last time, when Flink was brought up, we dug into the use-case and
> realized
> > that having Flink/Beam support for windowing on physical/arrival time
> > (hoodie_commit_time) would be valuable and that's why Flink was being
> > proposed.
> >
> > I would like to separate two aspects that I feel are intermingled here.
> >
> > 1) Writing datasets using Flink :  Today hoodie-spark-datasource or
> > deltastreamer tool all use Spark to write Hudi datasets. It would be nice
> > if we can do this as a part of a Flink job as well.
> > 2) Query Hudi datasets using Flink : we can perform awesome streaming
> style
> > pipelines on top of Hudi, since it provided the _hoodie_commit_time
> arrival
> > time watermarks.. Nick & I are trying to flesh this out more with
> > motivating use-cases and make the case for doing this.
> >
> >
> > Now questions for folks driving HUDI-184. Is the scope 1 or 2 or 1 & 2. ?
> > My suggestion would be to tackle 1 in HUDI-184 and Nick/I can parallel
> > tackle 2
> >
> > This is exciting work :). Hope we can get past the current release, jar
> > fixes and get to this.. ha ha.
> >
> > /thanks/vinoth
> >
> >
> >
> >
> >
> >
> > On Wed, Jul 31, 2019 at 6:01 AM Semantic Beeng <[email protected]>
> > wrote:
> >
> > > All,
> > >
> > > @vc and I have been mulling on this for a while and are working on some
> > > material to start this.
> > >
> > > But
> > >
> > > 1. We want to start with requirements, right?
> > >
> > > Last time we discussed this we asked for use cases, needs etc.
> > >
> > > Have some here
> > >
> >
> https://cwiki.apache.org/confluence/display/HUDI/Hudi+for+Continuous+Deep+Analytics
> > > .
> > >
> > > Taher - any news on that example application about trade
> reconciliation,
> > > please?
> > >
> > > 2. Will push that we also drive this with proper architecture decisions
> > to
> > > map the choices in a principled way.
> > >
> > > This will also help users make sense of fit with their architectures.
> See
> > > https://adr.github.io
> > >
> > > As architect consider that technology to technology integrations are
> bad
> > > idea.
> > >
> > > Reminds us of the M to N integration (point to point) in enterprise
> > > systems.
> > >
> > > Examples
> > >
> > > 1.
> > >
> >
> https://github.com/alibaba/flink-ai-extended/tree/master/flink-ml-tensorflow
> > >
> > > 2. https://github.com/yahoo/TensorFlowOnSpark
> > >
> > > And now imagine Hudi hard linked to Flink.
> > >
> > > Someone trying to use both Spark and TF for ML and and Flink for data
> > > sliding would be in a tough spot to reconcile.
> > >
> > > And surely quite a few library version conflicts too.
> > >
> > > Instead we need to seek some abstractions in between them to decouple.
> > >
> > > Hence, the more use cases and design examples you provide the better.
> :-)
> > >
> > > @vc - thoughts?
> > >
> > > Kind regards
> > >
> > > Nick
> > >
> > >
> > >
> > >
> > >
> > > On July 31, 2019 at 8:06 AM Vinoth Chandar <[email protected]> wrote:
> > >
> > >
> > > >>First of all, we should agree on the plan.
> > > +100 . this will be a very involved process.. if we can get a plan
> agreed
> > > upon, then we can start scoping the subtasks..
> > >
> > > On Wed, Jul 31, 2019 at 2:11 AM Vinay Patil <[email protected]>
> > > wrote:
> > >
> > > Hi Guys,
> > >
> > > Add me in this as well, missed out on this last time.
> > >
> > > Regards,
> > > Vinay Patil
> > >
> > >
> >
>

Re: [DISCUSS] Integrate Hudi with Apache Flink

Reply via email to