By "ecosystem modules", I mean: 1. gluten-celeborn 2. gluten-uniffle 3. gluten-delta 4. gluten-hudi 5. gluten-iceberg
Currently in our Maven dependency graph, the modules are all depending on `backend-velox` and `backend-clickhouse`. As we have reached the agreement to move backend-specific code into backend modules in the previous discussion[1][2], we can start doing some refactors against these modules. I'll suggest flipping the dependency directions, which means, to make VL BE (Velox backend) or CH BE (clickhouse backend) depend on the ecosystem modules they required, and put the custom code in the relevant folders in `backends-clickhouse` and `backends-velox`. For example, this will result in `gluten-celeborn/common` being moved to upper level `gluten-celeborn`, and `gluten-celeborn/velox` being moved to `backends-velox/src/main/celeborn`. This way will bring the following advantages: 1. Better encapsulation of backend code: Backends' ecosystem support code will be organized in their own modules 2. Flexibler ecosystem enabling in backends: Backends could selectively add relevant Maven pom and if needed, the ecosystem custom folders in their own module, to enable and test the ecosystem, without altering any common module. For example, after this refactor, once CH BE tends to add support for Iceberg, we can modify backends-clickhouse/pom.xml to add `gluten-iceberg` as dependency. Shared UTs can be enabled easily by doing some Scala class extensions. Any thoughts will be appreciated. Thanks, Hongze [1] https://lists.apache.org/thread/c0zsw3jyhd4zhyw4v51kn9chpftswrkp [2] https://github.com/apache/incubator-gluten/discussions/7735 --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
