GitHub user baibaichen created a discussion: [Discuss] Keep the commit history when adding a new spark version
Here is the English translation of your text: Hi everyone, Referring to [this mail list archive](https://lists.apache.org/[email protected]:2022-5:spark%203.3), I suggest we follow a similar approach when Gluten supports a new Spark version. The log below is from https://github.com/apache/iceberg: ``` ed26fd7ac - DOAP: add release 1.10.1 (#14917) 317968426 - Spark: Initial support for 4.1.0 => step 3 build ddea5658e - Spark: Copy back 4.1 as 4.0 => step 2 copy back 314989243 - Spark: Move 4.0 as 4.1 => step 1 move 752a2820c - Site: correct release time for 1.10.1 (#14918) ``` The proposed changes for Gluten are: 1. GitHub offers three methods for merging PRs. While Iceberg uses **"Rebase and merge"** for this type of PR, I suggest we perform a manual rebase first and then use **"Create a merge commit"**. 2. Iceberg's **"Initial support for 4.1.0"** is a single large commit. My suggestion is for developers to use a merge commit locally to retain the history of all smaller individual commits. Here is a PR I have provided: https://github.com/apache/incubator-gluten/pull/11347. After merging, the history would look like this: ``` * 5d2350e0f test |\ | * d7d78a453 Spark: Initial support for 4.1.0 | |\ | | * d2ee7884a [Fix] Add GeographyVal and GeometryVal support in ColumnarArrayShim | | * 2ef147cb2 [4.1.0] Only Run ArrowEvalPythonExecSuite tests up to Spark 4.0, we need update ci python to 3.10 | | * 031b12e03 [4.1.0] Exclude split test in VeloxStringFunctionsSuite | | * 0e7f6d456 [Fix] Refactor Spark version checks in VeloxHashJoinSuite to improve readability and maintainability | | * f1ecdf312 [Fix] Fix MiscOperatorSuite to support OneRowRelationExec plan Spark 4.1 | | * 3953a17c8 [Fix] Refactor Spark version checks in VeloxHashJoinSuite to improve readability and maintainability | | * 6228d63b3 [Fix] Update Scala version to 2.13.17 in pom.xml to fix `java.lang.NoSuchMethodError: 'java.lang.S.. | | * 82b8cc059 [Fix] Using new interface of ParquetFooterReader | | * df3741ddf [Fix] Adapt to DataSourceV2Relation interface change | | * 1855fe465 [Fix] Adapt to QueryExecution.createSparkPlan interface change | | * 61845a8a4 [Fix] Remove TimeAdd from ExpressionConverter and ExpressionMappings for test | | * 12b401798 [Fix] Add missing StoragePartitionJoinParams import in BatchScanExecShim and AbstractBatchScanExec | | * 9ba81906b [Fix] Remove unused MDC import in FileSourceScanExecShim.scala | | * ea32ac233 [Fix] Add printOutputColumns parameter to generateTreeString methods | | * 937b8c397 [Fix] Use class name instead of class object for streaming call detection to ensure Spark 4.1 comp.. | | * 21293c4bc [Feat] Introduce Spark41Shims and update build configuration to support Spark 4.1. | |/ | * 9b3d3038e Spark: Copy back 4.1 as 4.0 | * 68ece8e6b Spark: Move 4.0 as 4.1 |/ * be247e112 [GLUTEN-6887][VL] Daily Update Velox Version (dft-2026_01_02) (#11349) => current head ``` Below is how this would appear in different IDEs: | IDE | | | :--- | :--- | | VS Code | | | IntelliJ |  | For instance, PR https://github.com/apache/incubator-gluten/pull/11331 introduced ColumnarArrayShim. Checking its history reveals: ``` d2ee7884a [Fix] Add GeographyVal and GeometryVal support in ColumnarArrayShim 68ece8e6b Spark: Move 4.0 as 4.1 41073d5b1 [GLUTEN-11330][VL] Make PartialProject support array and map with null values (#11331) ``` GitHub link: https://github.com/apache/incubator-gluten/discussions/11352 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
