+1 on Vinoth's suggestion on waiting for the lower level (write-client) 
re-factored and re-organized first.  We can then look at Data-Source and 
DeltaStreamer to make sure how to best organize them. 
Balaji.V    On Sunday, March 8, 2020, 11:06:13 PM PDT, Vinoth Chandar 
<[email protected]> wrote:  
 
 >> make delta streamer a engine agnostic part so that Spark and Flink can
share some common logic.

If we make the change at the Write Client level to make it engine agnostic,
it should help with most of the cases.. I believe there will be spark
specific pieces in the Source abstraction since those are using spark
datasources underneath in some cases..  My opinion is that we can first
focus our efforts on making hudi-client agnostic and pluggable with
different engines.. We can tackle deltastreamer down the line once we have
it..

On Wed, Mar 4, 2020 at 6:51 PM vino yang <[email protected]> wrote:

> Hi guys,
>
> My original thought is to make delta streamer a engine agnostic part so
> that Spark and Flink can share some common logic.
>
> >>I am not sure the ROI is there for renaming to hudi-deltastreamer  and
> pull this out.. Everytime we change a module name
>
> Actually, here my suggestion is to move the delta streamer to another new
> module and keep the current hudi-utilities module. Although, in a way,
> moving classes are similar to rename the module name.
>
> >> I propose we leave this module to be spark specific, i.e depending on
> hudi-spark alone
>
> OK, will think to build delta streaming mode via Flink and ignore the
> current implementation of delta streamer.
>
> Best,
> Vino
>
> Vinoth Chandar <[email protected]> 于2020年3月5日周四 上午12:47写道:
>
> > I am not sure the ROI is there for renaming to hudi-deltastreamer  and
> pull
> > this out.. Everytime we change a module name, its a breaking change and I
> > would prefer if we reserved those for really pressing issues.. or take
> > natural course of development and get there..
> >
> > Regarding how multi framework support would affect this module, I propose
> > we leave this module to be spark specific, i.e depending on hudi-spark
> > alone.. Until, we can make flink work end-end.
> > This feels kind of premature to me.
> >
> > On Wed, Mar 4, 2020 at 8:37 AM Gary Li <[email protected]> wrote:
> >
> > > +1. hudi-delta gives me the feeling that it has something to do with
> > other
> > > frameworks... I’d vote for another name hudi-deltastreamer or
> > hudi-streamer
> > > or hudi-stream.
> > >
> > > On Wed, Mar 4, 2020 at 2:29 AM vino yang <[email protected]>
> wrote:
> > >
> > > > Hi folks,
> > > >
> > > > Currently, it seems the content of hudi-utilities looks a bit mix.
> > > > Summarize all of them, there are two aspects list below:
> > > >
> > > >
> > > >    - delta streamer and its relevant packages, e.g. deltastreamer,
> > > sources,
> > > >    schema, transform, these packages are served for delta streamer.
> > > >    - Some utility tools such as
> > > >    HDFSParquetImporter、HiveIncrementalPuller、HoodieCleaner and so on
> > > >
> > > >
> > > > We are trying to refactor the computing engine relevant business
> logic.
> > > > Delta Streamer (especially, the sources package is a start point of a
> > job
> > > > of Spark/Flink) will be affected. Doing this restructure can make the
> > > work
> > > > more clear and focus.
> > > >
> > > > I would like to start a proposal to restructure the hudi-utilites
> > module.
> > > > Considering delta streamer is a great feature for hudi, the logic is
> > very
> > > > much in the hudi-utilites. Can we raise its importance via making the
> > > delta
> > > > streamer as a single module? It could be named e.g. hudi-delta or
> > > something
> > > > else. Then let the hudi-utilities be a real utilities module to host
> > > > HDFSParquetImporter、HiveIncrementalPuller、HoodieCleaner tools.
> > > >
> > > > In short, we can do these restructure works:
> > > >
> > > >
> > > >    - create a new module, named “hudi-delta” (or other name?) and
> move
> > > the
> > > >    deltastreamer, sources, schema, transform … packages into this
> > module
> > > >    - leave HDFSParquetImporter、HiveIncrementalPuller、HoodieCleaner …
> in
> > > the
> > > >    current place (utilities module)
> > > >
> > > > What do you think?
> > > >
> > > > Any comments and suggestions are welcome and appreciated.
> > > >
> > > > Best,
> > > > Vino
> > > >
> > >
> >
>  

Reply via email to