+1 that's a great improvement. On Wed, Sep 15, 2021 at 10:40 AM Sivabalan <n.siv...@gmail.com> wrote:
> ++1. definitely help's Hudi scale and makes it more maintainable. Thanks > for driving this effort. Mostly devs show interest in major features and > don't like to spend time in such foundational work. But as the project > scales, these foundational work will have a higher returns in the long run. > > On Wed, Sep 15, 2021 at 8:29 AM Vinoth Chandar <vin...@apache.org> wrote: > > > Another +1 , HoodieData abstraction will go a long way in reducing LoC. > > > > Happy to work with you to see this through! I really encourage top > > contributors to the Flink and Java clients as well, > > actively review all PRs, given there are subtle differences everywhere. > > > > This will help us smoothly provide all the core features across engines. > > Also help us easily write a DataSet/Row based > > client for Spark as well. > > > > Onwards and upwards > > Vinoth > > > > On Wed, Sep 15, 2021 at 4:57 AM vino yang <yanghua1...@gmail.com> wrote: > > > > > Hi Ethan, > > > > > > Big +1 for the proposal. > > > > > > Actually, we have discussed this topic before.[1] > > > > > > Will review your refactor PR later. > > > > > > Best, > > > Vino > > > > > > [1]: > > > > > > > > > https://lists.apache.org/thread.html/r71d96d285c735b1611920fb3e7224c9ce6fd53d09bf0e8f144f4fcbd%40%3Cdev.hudi.apache.org%3E > > > > > > > > > Y Ethan Guo <ethan.guoyi...@gmail.com> 于2021年9月15日周三 下午3:34写道: > > > > > > > Hi all, > > > > > > > > hudi-client module has core Hudi abstractions and client logic for > > > > different engines like Spark, Flink, and Java. While previous effort > > > > (HUDI-538 [1]) has decoupled the integration with Spark, there is > quite > > > > some code duplication across different engines for almost the same > > logic > > > > due to the current interface design. Some part also has divergence > > among > > > > engines, making debugging and support difficult. > > > > > > > > I propose to further refactor the hudi-client module with the goal of > > > > improving the code reuse across multiple engines and reducing the > > > > divergence of the logic among them, so that the core Hudi action > > > execution > > > > logic should be shared across engines, except for engine specific > > > > transformations. Such a pattern also allows easy support of core > Hudi > > > > functionality for all engines in the future. Specifically, > > > > > > > > (1) Abstracts the transformation boilerplates inside the > > > > HoodieEngineContext and implements the engine-specific data > > > transformation > > > > logic in the subclasses. Type cast can be done inside the engine > > > context. > > > > (2) Creates new HoodieData abstraction for passing input and output > > along > > > > the flow of execution, and uses it in different Hudi abstractions, > > e.g., > > > > HoodieTable, HoodieIOHandle, BaseActionExecutor, instead of enforcing > > > type > > > > parameters encountering RDD<HoodieRecord> and List<HoodieRecord> > which > > > are > > > > one source of duplication. > > > > (3) Extracts common execution logic to hudi-client-common module from > > > > multiple engines. > > > > > > > > As a first step and exploration for item (1) and (3) above, I've > tried > > to > > > > refactor the rollback actions and the PR is here [HUDI-2433][2]. In > > this > > > > PR, I completely remove all engine-specific rollback packages and > only > > > keep > > > > one rollback package in hudi-client-common, adding ~350 LoC while > > > deleting > > > > 1.3K LoC. My next step is to refactor the commit action which > > > encompasses > > > > item (2) above. > > > > > > > > What do you folks think and any other suggestions? > > > > > > > > [1] [HUDI-538] [UMBRELLA] Restructuring hudi client module for multi > > > engine > > > > support > > > > https://issues.apache.org/jira/browse/HUDI-538 > > > > [2] PR: [HUDI-2433] Refactor rollback actions in hudi-client module > > > > https://github.com/apache/hudi/pull/3664/files > > > > > > > > Best, > > > > - Ethan > > > > > > > > > > > > -- > Regards, > -Sivabalan >