+1. hudi-delta gives me the feeling that it has something to do with other frameworks... I’d vote for another name hudi-deltastreamer or hudi-streamer or hudi-stream.
On Wed, Mar 4, 2020 at 2:29 AM vino yang <[email protected]> wrote: > Hi folks, > > Currently, it seems the content of hudi-utilities looks a bit mix. > Summarize all of them, there are two aspects list below: > > > - delta streamer and its relevant packages, e.g. deltastreamer, sources, > schema, transform, these packages are served for delta streamer. > - Some utility tools such as > HDFSParquetImporter、HiveIncrementalPuller、HoodieCleaner and so on > > > We are trying to refactor the computing engine relevant business logic. > Delta Streamer (especially, the sources package is a start point of a job > of Spark/Flink) will be affected. Doing this restructure can make the work > more clear and focus. > > I would like to start a proposal to restructure the hudi-utilites module. > Considering delta streamer is a great feature for hudi, the logic is very > much in the hudi-utilites. Can we raise its importance via making the delta > streamer as a single module? It could be named e.g. hudi-delta or something > else. Then let the hudi-utilities be a real utilities module to host > HDFSParquetImporter、HiveIncrementalPuller、HoodieCleaner tools. > > In short, we can do these restructure works: > > > - create a new module, named “hudi-delta” (or other name?) and move the > deltastreamer, sources, schema, transform … packages into this module > - leave HDFSParquetImporter、HiveIncrementalPuller、HoodieCleaner … in the > current place (utilities module) > > What do you think? > > Any comments and suggestions are welcome and appreciated. > > Best, > Vino >
