+1 that's a great improvement.

On Wed, Sep 15, 2021 at 10:40 AM Sivabalan <n.siv...@gmail.com> wrote:

> ++1. definitely help's Hudi scale and makes it more maintainable. Thanks
> for driving this effort. Mostly devs show interest in major features and
> don't like to spend time in such foundational work. But as the project
> scales, these foundational work will have a higher returns in the long run.
>
> On Wed, Sep 15, 2021 at 8:29 AM Vinoth Chandar <vin...@apache.org> wrote:
>
> > Another +1 ,  HoodieData abstraction will go a long way in reducing LoC.
> >
> > Happy to work with you to see this through! I really encourage top
> > contributors to the Flink and Java clients as well,
> > actively review all PRs, given there are subtle differences everywhere.
> >
> > This will help us smoothly provide all the core features across engines.
> > Also help us easily write a DataSet/Row based
> > client for Spark as well.
> >
> > Onwards and upwards
> > Vinoth
> >
> > On Wed, Sep 15, 2021 at 4:57 AM vino yang <yanghua1...@gmail.com> wrote:
> >
> > > Hi Ethan,
> > >
> > > Big +1 for the proposal.
> > >
> > > Actually, we have discussed this topic before.[1]
> > >
> > > Will review your refactor PR later.
> > >
> > > Best,
> > > Vino
> > >
> > > [1]:
> > >
> > >
> >
> https://lists.apache.org/thread.html/r71d96d285c735b1611920fb3e7224c9ce6fd53d09bf0e8f144f4fcbd%40%3Cdev.hudi.apache.org%3E
> > >
> > >
> > > Y Ethan Guo <ethan.guoyi...@gmail.com> 于2021年9月15日周三 下午3:34写道:
> > >
> > > > Hi all,
> > > >
> > > > hudi-client module has core Hudi abstractions and client logic for
> > > > different engines like Spark, Flink, and Java.  While previous effort
> > > > (HUDI-538 [1]) has decoupled the integration with Spark, there is
> quite
> > > > some code duplication across different engines for almost the same
> > logic
> > > > due to the current interface design.  Some part also has divergence
> > among
> > > > engines, making debugging and support difficult.
> > > >
> > > > I propose to further refactor the hudi-client module with the goal of
> > > > improving the code reuse across multiple engines and reducing the
> > > > divergence of the logic among them, so that the core Hudi action
> > > execution
> > > > logic should be shared across engines, except for engine specific
> > > > transformations.  Such a pattern also allows easy support of core
> Hudi
> > > > functionality for all engines in the future.  Specifically,
> > > >
> > > > (1) Abstracts the transformation boilerplates inside the
> > > > HoodieEngineContext and implements the engine-specific data
> > > transformation
> > > > logic in the subclasses.  Type cast can be done inside the engine
> > > context.
> > > > (2) Creates new HoodieData abstraction for passing input and output
> > along
> > > > the flow of execution, and uses it in different Hudi abstractions,
> > e.g.,
> > > > HoodieTable, HoodieIOHandle, BaseActionExecutor, instead of enforcing
> > > type
> > > > parameters encountering RDD<HoodieRecord> and List<HoodieRecord>
> which
> > > are
> > > > one source of duplication.
> > > > (3) Extracts common execution logic to hudi-client-common module from
> > > > multiple engines.
> > > >
> > > > As a first step and exploration for item (1) and (3) above, I've
> tried
> > to
> > > > refactor the rollback actions and the PR is here [HUDI-2433][2].  In
> > this
> > > > PR, I completely remove all engine-specific rollback packages and
> only
> > > keep
> > > > one rollback package in hudi-client-common, adding ~350 LoC while
> > > deleting
> > > > 1.3K LoC.  My next step is to refactor the commit action which
> > > encompasses
> > > > item (2) above.
> > > >
> > > > What do you folks think and any other suggestions?
> > > >
> > > > [1] [HUDI-538] [UMBRELLA] Restructuring hudi client module for multi
> > > engine
> > > > support
> > > > https://issues.apache.org/jira/browse/HUDI-538
> > > > [2] PR: [HUDI-2433] Refactor rollback actions in hudi-client module
> > > > https://github.com/apache/hudi/pull/3664/files
> > > >
> > > > Best,
> > > > - Ethan
> > > >
> > >
> >
>
>
> --
> Regards,
> -Sivabalan
>

Reply via email to