By "ecosystem modules", I mean:

1. gluten-celeborn
2. gluten-uniffle
3. gluten-delta
4. gluten-hudi
5. gluten-iceberg

Currently in our Maven dependency graph, the modules are all depending
on `backend-velox` and  `backend-clickhouse`. As we have reached the
agreement to move backend-specific code into backend modules in the
previous discussion[1][2], we can start doing some refactors against
these modules.

I'll suggest flipping the dependency directions, which means, to make
VL BE (Velox backend) or CH BE (clickhouse backend) depend on the
ecosystem modules they required, and put the custom code in the
relevant folders in `backends-clickhouse` and `backends-velox`. For
example, this will result in `gluten-celeborn/common` being moved to
upper level `gluten-celeborn`, and `gluten-celeborn/velox` being moved
to `backends-velox/src/main/celeborn`.

This way will bring the following advantages:

1. Better encapsulation of backend code: Backends' ecosystem support
code will be organized in their own modules
2. Flexibler ecosystem enabling in backends: Backends could
selectively add relevant Maven pom and if needed, the ecosystem custom
folders in their own module, to enable and test the ecosystem, without
altering any common module.

For example, after this refactor, once CH BE tends to add support for
Iceberg, we can modify backends-clickhouse/pom.xml to add
`gluten-iceberg` as dependency. Shared UTs can be enabled easily by
doing some Scala class extensions.

Any thoughts will be appreciated.

Thanks,
Hongze

[1] https://lists.apache.org/thread/c0zsw3jyhd4zhyw4v51kn9chpftswrkp
[2] https://github.com/apache/incubator-gluten/discussions/7735

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to