Hi guys,

Feeling the pain of supporting Flink engine for Hudi, it is necessary to 
discuss the design of high cohesion, low coupling, and plug-in for the 
calculation engine module here. 


Now Hudi's design, in order to highlight its core components, is a patchwork of 
the Spark RDD API mixed with business logic scattered in multiple modules and 
various types of methods. As a result, developers with a background in 
computing engines have difficulty understanding the main process of Spark job, 
and the calculation engine plug-in is also more difficult, because the general 
interface carries the context of RDD and Spark, unless large-scale 
restructuring is started.


In my opinion, it is necessary to refactor the Hudi integration module through 
plug-inization to facilitate the subsequent integration of Spark and FLink.


Best,
Nicholas

Reply via email to