Hi folks, In our code base, there are still some contents in the common module but actually belong to one certain backend (CH or VL). Do you think we can do something to get rid of them?
The contents I found: 1. Backend-specific configurations in GlutenConfig.scala 2. Backend-specific code in UT module, E.g., the calls to BackendTestUtils.is<name>BackendLoaded The similar issue also apply to our source directory tree, for example: 1. `gluten-ut` module has spark3x/src/test/backends-clickhouse and spark3x/src/test/backends-velox at the same time 2. `cpp` (velox cpp code dir) / `cpp-ch` are in root directory 3. Ecosystem supports, e.g., `gluten-uniffle` also has a `velox` subfolder 4. Documentations For a long time my opinion is we should finally get most of these contents moved to their own modules. For example, UT customizations can be placed in `backend-clickhouse` and `backend-velox`. Configurations can be placed in `VeloxConfig` / `CHConfig` or something, like I mentioned in another issue[1], etc. Having said that, I think there should be some contents that are unnecessary to move around so they could remain in common modules, for example, the backend GHA CI scripts and backend documentations. Others look to be reasonable to move but considerable efforts of refactors will be needed. In future ,I think I could continue taking the majority of this work but help will be needed when it comes to the CH backend or to the ecosystem code (rss, data lake). Any thoughts will be appreciated. Thanks, Hongze [1] https://github.com/apache/incubator-gluten/issues/6970 --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
