Hi all, I also prefer the approach to not have duplicated code.
Looking at the `spark/src` and `integration/src` directories, I see 23 byte-identical files, and 4 more files that have slight Spark-version specific differences that can be "deduplicated" with base classes plus version specific adapters. This appears to match Polaris's use of Spark, which is different from projects that deeply integrate with Spark or Flink planning and execution internals. Robert On Mon, Jun 1, 2026 at 7:48 AM Jean-Baptiste Onofré <[email protected]> wrote: > Hi Dmitri, > > While I don't have a major concern with duplicating code in principle, > the main issue is the quantity of duplication. If the amount of > redundant code is large, it becomes significantly harder to maintain. > > For this reason, I prefer the second option of factoring out common code. > > Regards, > JB > > On Thu, May 28, 2026 at 11:21 PM Dmitri Bourlatchkov <[email protected]> > wrote: > > > > Hi All, > > > > This is another discussion stemming from today's Community Sync call and > PR > > [4535]. > > > > Adding support for Spark 4 apparently produced a substantial amount of > > "copied" code in [4535]. > > > > Points in favour of copy: > > > > * Adjusting to differences between Spark versions is easier > > > > * Dropping support for old Spark versions is easy (when they expire). > > > > Points in favour of extracting common modules: > > > > * Nice code organization. Common code is unit-tested once. > > > > * Bug fixes in shared logic only need to be done in one place. > > > > * Polaris does not appear to depend on deep Spark API (no query planning, > > etc.) so differences between Spark versions can probably be handled by > > allowing a small number of customization points in the common code. > > > > I tend to prefer the second approach, that is factoring out common code > and > > sharing it between Spark 3.x and 4.x modules with the expectation that > the > > size of the common code is much larger than the size of the > > version-specific code. > > > > Thoughts? > > > > [4535] https://github.com/apache/polaris/pull/4535 > > > > Thanks, > > Dmitri. >
