Hi Ethan,

Big +1 for the proposal.

Actually, we have discussed this topic before.[1]

Will review your refactor PR later.

Best,
Vino

[1]:
https://lists.apache.org/thread.html/r71d96d285c735b1611920fb3e7224c9ce6fd53d09bf0e8f144f4fcbd%40%3Cdev.hudi.apache.org%3E


Y Ethan Guo <ethan.guoyi...@gmail.com> 于2021年9月15日周三 下午3:34写道:

> Hi all,
>
> hudi-client module has core Hudi abstractions and client logic for
> different engines like Spark, Flink, and Java.  While previous effort
> (HUDI-538 [1]) has decoupled the integration with Spark, there is quite
> some code duplication across different engines for almost the same logic
> due to the current interface design.  Some part also has divergence among
> engines, making debugging and support difficult.
>
> I propose to further refactor the hudi-client module with the goal of
> improving the code reuse across multiple engines and reducing the
> divergence of the logic among them, so that the core Hudi action execution
> logic should be shared across engines, except for engine specific
> transformations.  Such a pattern also allows easy support of core Hudi
> functionality for all engines in the future.  Specifically,
>
> (1) Abstracts the transformation boilerplates inside the
> HoodieEngineContext and implements the engine-specific data transformation
> logic in the subclasses.  Type cast can be done inside the engine context.
> (2) Creates new HoodieData abstraction for passing input and output along
> the flow of execution, and uses it in different Hudi abstractions, e.g.,
> HoodieTable, HoodieIOHandle, BaseActionExecutor, instead of enforcing type
> parameters encountering RDD<HoodieRecord> and List<HoodieRecord> which are
> one source of duplication.
> (3) Extracts common execution logic to hudi-client-common module from
> multiple engines.
>
> As a first step and exploration for item (1) and (3) above, I've tried to
> refactor the rollback actions and the PR is here [HUDI-2433][2].  In this
> PR, I completely remove all engine-specific rollback packages and only keep
> one rollback package in hudi-client-common, adding ~350 LoC while deleting
> 1.3K LoC.  My next step is to refactor the commit action which encompasses
> item (2) above.
>
> What do you folks think and any other suggestions?
>
> [1] [HUDI-538] [UMBRELLA] Restructuring hudi client module for multi engine
> support
> https://issues.apache.org/jira/browse/HUDI-538
> [2] PR: [HUDI-2433] Refactor rollback actions in hudi-client module
> https://github.com/apache/hudi/pull/3664/files
>
> Best,
> - Ethan
>

Reply via email to