Hi folks,

In our code base, there are still some contents in the common module
but actually belong to one certain backend (CH or VL). Do you think we
can do something to get rid of them?

The contents I found:

1. Backend-specific configurations in GlutenConfig.scala
2. Backend-specific code in UT module, E.g., the calls to
BackendTestUtils.is<name>BackendLoaded

The similar issue also apply to our source directory tree, for example:

1. `gluten-ut` module has spark3x/src/test/backends-clickhouse and
spark3x/src/test/backends-velox at the same time
2. `cpp` (velox cpp code dir) / `cpp-ch` are in root directory
3. Ecosystem supports, e.g., `gluten-uniffle` also has a `velox` subfolder
4. Documentations

For a long time my opinion is we should finally get most of these
contents moved to their own modules. For example, UT customizations
can be placed in `backend-clickhouse` and `backend-velox`.
Configurations can be placed in `VeloxConfig` / `CHConfig` or
something, like I mentioned in another issue[1], etc.

Having said that, I think there should be some contents that are
unnecessary to move around so they could remain in common modules, for
example, the backend GHA CI scripts and backend documentations. Others
look to be reasonable to move but considerable efforts of refactors
will be needed.

In future ,I think I could continue taking the majority of this work
but help will be needed when it comes to  the CH backend or to the
ecosystem code (rss, data lake).

Any thoughts will be appreciated.

Thanks,
Hongze

[1] https://github.com/apache/incubator-gluten/issues/6970

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to