Hi guys, My original thought is to make delta streamer a engine agnostic part so that Spark and Flink can share some common logic.
>>I am not sure the ROI is there for renaming to hudi-deltastreamer and pull this out.. Everytime we change a module name Actually, here my suggestion is to move the delta streamer to another new module and keep the current hudi-utilities module. Although, in a way, moving classes are similar to rename the module name. >> I propose we leave this module to be spark specific, i.e depending on hudi-spark alone OK, will think to build delta streaming mode via Flink and ignore the current implementation of delta streamer. Best, Vino Vinoth Chandar <[email protected]> 于2020年3月5日周四 上午12:47写道: > I am not sure the ROI is there for renaming to hudi-deltastreamer and pull > this out.. Everytime we change a module name, its a breaking change and I > would prefer if we reserved those for really pressing issues.. or take > natural course of development and get there.. > > Regarding how multi framework support would affect this module, I propose > we leave this module to be spark specific, i.e depending on hudi-spark > alone.. Until, we can make flink work end-end. > This feels kind of premature to me. > > On Wed, Mar 4, 2020 at 8:37 AM Gary Li <[email protected]> wrote: > > > +1. hudi-delta gives me the feeling that it has something to do with > other > > frameworks... I’d vote for another name hudi-deltastreamer or > hudi-streamer > > or hudi-stream. > > > > On Wed, Mar 4, 2020 at 2:29 AM vino yang <[email protected]> wrote: > > > > > Hi folks, > > > > > > Currently, it seems the content of hudi-utilities looks a bit mix. > > > Summarize all of them, there are two aspects list below: > > > > > > > > > - delta streamer and its relevant packages, e.g. deltastreamer, > > sources, > > > schema, transform, these packages are served for delta streamer. > > > - Some utility tools such as > > > HDFSParquetImporter、HiveIncrementalPuller、HoodieCleaner and so on > > > > > > > > > We are trying to refactor the computing engine relevant business logic. > > > Delta Streamer (especially, the sources package is a start point of a > job > > > of Spark/Flink) will be affected. Doing this restructure can make the > > work > > > more clear and focus. > > > > > > I would like to start a proposal to restructure the hudi-utilites > module. > > > Considering delta streamer is a great feature for hudi, the logic is > very > > > much in the hudi-utilites. Can we raise its importance via making the > > delta > > > streamer as a single module? It could be named e.g. hudi-delta or > > something > > > else. Then let the hudi-utilities be a real utilities module to host > > > HDFSParquetImporter、HiveIncrementalPuller、HoodieCleaner tools. > > > > > > In short, we can do these restructure works: > > > > > > > > > - create a new module, named “hudi-delta” (or other name?) and move > > the > > > deltastreamer, sources, schema, transform … packages into this > module > > > - leave HDFSParquetImporter、HiveIncrementalPuller、HoodieCleaner … in > > the > > > current place (utilities module) > > > > > > What do you think? > > > > > > Any comments and suggestions are welcome and appreciated. > > > > > > Best, > > > Vino > > > > > >
