Hi guys,

My original thought is to make delta streamer a engine agnostic part so
that Spark and Flink can share some common logic.

>>I am not sure the ROI is there for renaming to hudi-deltastreamer  and
pull this out.. Everytime we change a module name

Actually, here my suggestion is to move the delta streamer to another new
module and keep the current hudi-utilities module. Although, in a way,
moving classes are similar to rename the module name.

>> I propose we leave this module to be spark specific, i.e depending on
hudi-spark alone

OK, will think to build delta streaming mode via Flink and ignore the
current implementation of delta streamer.

Best,
Vino

Vinoth Chandar <[email protected]> 于2020年3月5日周四 上午12:47写道:

> I am not sure the ROI is there for renaming to hudi-deltastreamer  and pull
> this out.. Everytime we change a module name, its a breaking change and I
> would prefer if we reserved those for really pressing issues.. or take
> natural course of development and get there..
>
> Regarding how multi framework support would affect this module, I propose
> we leave this module to be spark specific, i.e depending on hudi-spark
> alone.. Until, we can make flink work end-end.
> This feels kind of premature to me.
>
> On Wed, Mar 4, 2020 at 8:37 AM Gary Li <[email protected]> wrote:
>
> > +1. hudi-delta gives me the feeling that it has something to do with
> other
> > frameworks... I’d vote for another name hudi-deltastreamer or
> hudi-streamer
> > or hudi-stream.
> >
> > On Wed, Mar 4, 2020 at 2:29 AM vino yang <[email protected]> wrote:
> >
> > > Hi folks,
> > >
> > > Currently, it seems the content of hudi-utilities looks a bit mix.
> > > Summarize all of them, there are two aspects list below:
> > >
> > >
> > >    - delta streamer and its relevant packages, e.g. deltastreamer,
> > sources,
> > >    schema, transform, these packages are served for delta streamer.
> > >    - Some utility tools such as
> > >    HDFSParquetImporter、HiveIncrementalPuller、HoodieCleaner and so on
> > >
> > >
> > > We are trying to refactor the computing engine relevant business logic.
> > > Delta Streamer (especially, the sources package is a start point of a
> job
> > > of Spark/Flink) will be affected. Doing this restructure can make the
> > work
> > > more clear and focus.
> > >
> > > I would like to start a proposal to restructure the hudi-utilites
> module.
> > > Considering delta streamer is a great feature for hudi, the logic is
> very
> > > much in the hudi-utilites. Can we raise its importance via making the
> > delta
> > > streamer as a single module? It could be named e.g. hudi-delta or
> > something
> > > else. Then let the hudi-utilities be a real utilities module to host
> > > HDFSParquetImporter、HiveIncrementalPuller、HoodieCleaner tools.
> > >
> > > In short, we can do these restructure works:
> > >
> > >
> > >    - create a new module, named “hudi-delta” (or other name?) and move
> > the
> > >    deltastreamer, sources, schema, transform … packages into this
> module
> > >    - leave HDFSParquetImporter、HiveIncrementalPuller、HoodieCleaner … in
> > the
> > >    current place (utilities module)
> > >
> > > What do you think?
> > >
> > > Any comments and suggestions are welcome and appreciated.
> > >
> > > Best,
> > > Vino
> > >
> >
>

Reply via email to