Hi guys,
Feeling the pain of supporting Flink engine for Hudi, it is necessary to discuss the design of high cohesion, low coupling, and plug-in for the calculation engine module here. Now Hudi's design, in order to highlight its core components, is a patchwork of the Spark RDD API mixed with business logic scattered in multiple modules and various types of methods. As a result, developers with a background in computing engines have difficulty understanding the main process of Spark job, and the calculation engine plug-in is also more difficult, because the general interface carries the context of RDD and Spark, unless large-scale restructuring is started. In my opinion, it is necessary to refactor the Hudi integration module through plug-inization to facilitate the subsequent integration of Spark and FLink. Best, Nicholas