Hi Ethan, Big +1 for the proposal.
Actually, we have discussed this topic before.[1] Will review your refactor PR later. Best, Vino [1]: https://lists.apache.org/thread.html/r71d96d285c735b1611920fb3e7224c9ce6fd53d09bf0e8f144f4fcbd%40%3Cdev.hudi.apache.org%3E Y Ethan Guo <ethan.guoyi...@gmail.com> 于2021年9月15日周三 下午3:34写道: > Hi all, > > hudi-client module has core Hudi abstractions and client logic for > different engines like Spark, Flink, and Java. While previous effort > (HUDI-538 [1]) has decoupled the integration with Spark, there is quite > some code duplication across different engines for almost the same logic > due to the current interface design. Some part also has divergence among > engines, making debugging and support difficult. > > I propose to further refactor the hudi-client module with the goal of > improving the code reuse across multiple engines and reducing the > divergence of the logic among them, so that the core Hudi action execution > logic should be shared across engines, except for engine specific > transformations. Such a pattern also allows easy support of core Hudi > functionality for all engines in the future. Specifically, > > (1) Abstracts the transformation boilerplates inside the > HoodieEngineContext and implements the engine-specific data transformation > logic in the subclasses. Type cast can be done inside the engine context. > (2) Creates new HoodieData abstraction for passing input and output along > the flow of execution, and uses it in different Hudi abstractions, e.g., > HoodieTable, HoodieIOHandle, BaseActionExecutor, instead of enforcing type > parameters encountering RDD<HoodieRecord> and List<HoodieRecord> which are > one source of duplication. > (3) Extracts common execution logic to hudi-client-common module from > multiple engines. > > As a first step and exploration for item (1) and (3) above, I've tried to > refactor the rollback actions and the PR is here [HUDI-2433][2]. In this > PR, I completely remove all engine-specific rollback packages and only keep > one rollback package in hudi-client-common, adding ~350 LoC while deleting > 1.3K LoC. My next step is to refactor the commit action which encompasses > item (2) above. > > What do you folks think and any other suggestions? > > [1] [HUDI-538] [UMBRELLA] Restructuring hudi client module for multi engine > support > https://issues.apache.org/jira/browse/HUDI-538 > [2] PR: [HUDI-2433] Refactor rollback actions in hudi-client module > https://github.com/apache/hudi/pull/3664/files > > Best, > - Ethan >