Hi folks,

Currently, it seems the content of hudi-utilities looks a bit mix.
Summarize all of them, there are two aspects list below:


   - delta streamer and its relevant packages, e.g. deltastreamer, sources,
   schema, transform, these packages are served for delta streamer.
   - Some utility tools such as
   HDFSParquetImporter、HiveIncrementalPuller、HoodieCleaner and so on


We are trying to refactor the computing engine relevant business logic.
Delta Streamer (especially, the sources package is a start point of a job
of Spark/Flink) will be affected. Doing this restructure can make the work
more clear and focus.

I would like to start a proposal to restructure the hudi-utilites module.
Considering delta streamer is a great feature for hudi, the logic is very
much in the hudi-utilites. Can we raise its importance via making the delta
streamer as a single module? It could be named e.g. hudi-delta or something
else. Then let the hudi-utilities be a real utilities module to host
HDFSParquetImporter、HiveIncrementalPuller、HoodieCleaner tools.

In short, we can do these restructure works:


   - create a new module, named “hudi-delta” (or other name?) and move the
   deltastreamer, sources, schema, transform … packages into this module
   - leave HDFSParquetImporter、HiveIncrementalPuller、HoodieCleaner … in the
   current place (utilities module)

What do you think?

Any comments and suggestions are welcome and appreciated.

Best,
Vino

Reply via email to