Re: [PR] [VL] Respect orc.force.positional.evolution [gluten]

via GitHub Fri, 05 Jun 2026 01:15:49 -0700


Copilot commented on code in PR #12234:
URL: https://github.com/apache/gluten/pull/12234#discussion_r3361335573



##########
gluten-substrait/src/main/scala/org/apache/gluten/config/GlutenConfig.scala:
##########
@@ -576,6 +578,16 @@ object GlutenConfig extends ConfigRegistry {
       }
       .foreach { case (k, v) => nativeConfMap.put(k, v) }
 
+    // When `orc.force.positional.evolution=true`, vanilla Spark maps ORC 
columns by
+    // position rather than by name (see OrcUtils.requestedColumnIds). The 
Velox ORC reader
+    // must do the same, otherwise name-based matching against a mismatched 
file schema
+    // reads columns back as null/empty. Override the (Velox) 
orcUseColumnNames session conf
+    // so native reads ORC by position too. Harmless for backends that ignore 
this key.
+    // String literal is used because gluten-substrait cannot depend on 
backends-velox.
+    if (conf.getOrElse(SPARK_ORC_FORCE_POSITIONAL_EVOLUTION, 
"false").toBoolean) {
+      
nativeConfMap.put("spark.gluten.sql.columnar.backend.velox.orcUseColumnNames", 
"false")
+    }

Review Comment:
   This injects a Velox-specific session key into the native config map for 
*all* backends. That adds backend-specific behavior in the common config layer 
and may leak unexpected keys to non-Velox runtimes. Since 
`getNativeSessionConf` already receives `backendName`, gate this override to 
the Velox backend (and keep non-Velox configs untouched).



##########
backends-velox/src/test/scala/org/apache/gluten/execution/FallbackSuite.scala:
##########
@@ -344,6 +344,38 @@ class FallbackSuite extends 
VeloxWholeStageTransformerSuite with AdaptiveSparkPl
     }
   }
 
+  test("orcUseColumnNames should respect orc.force.positional.evolution") {
+    // When `orc.force.positional.evolution=true`, ORC must be read by 
position even though
+    // orcUseColumnNames defaults to true. The effective orcUseColumnNames 
therefore becomes
+    // false, which (as in the test above) requires passing the data schema to 
Velox and
+    // falls back when the schema contains an unsupported type (timestamp_ntz 
here).
+    val query = "SELECT c2 FROM test"

Review Comment:
   This test verifies the *fallback side-effect* of positional evolution (via 
unsupported `timestamp_ntz`) but it doesn’t exercise the reported failure mode 
where the ORC file schema column names don’t match the metastore schema 
(leading to empty/wrong results under name-based mapping). Consider extending 
the test to create a table whose schema names differ from the ORC file’s 
physical column names and assert that enabling 
`spark.hadoop.orc.force.positional.evolution=true` returns the correct 
non-empty results (matching vanilla Spark).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [VL] Respect orc.force.positional.evolution [gluten]

Reply via email to