yihua commented on code in PR #18455:
URL: https://github.com/apache/hudi/pull/18455#discussion_r3082898156


##########
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/testutils/GlutenTestUtils.java:
##########
@@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.testutils;
+
+import org.apache.spark.SparkConf;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Utility for injecting Gluten/Velox native execution into a test {@link 
SparkConf}.
+ *
+ * <p>Activated by passing {@code -Dgluten.bundle.jar=<path>} at test time.
+ * If {@code ai.onehouse.quanton.QuantonPlugin} is on the classpath it is 
preferred;
+ * otherwise {@code org.apache.gluten.GlutenPlugin} is used.
+ */
+public class GlutenTestUtils {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(GlutenTestUtils.class);
+
+  private GlutenTestUtils() {}
+
+  /**
+   * Applies Gluten/Velox native-execution settings to {@code sparkConf} when 
the
+   * {@code gluten.bundle.jar} system property is set.  No-op otherwise.
+   */
+  public static void applyGlutenConf(SparkConf sparkConf) {
+    String glutenBundleJar = System.getProperty("gluten.bundle.jar");
+    if (glutenBundleJar == null || glutenBundleJar.isEmpty()) {
+      return;
+    }
+
+    String pluginClass;
+    try {
+      Class.forName("ai.onehouse.quanton.QuantonPlugin");
+      pluginClass = "ai.onehouse.quanton.QuantonPlugin";
+    } catch (ClassNotFoundException e) {
+      pluginClass = "org.apache.gluten.GlutenPlugin";
+    }

Review Comment:
   _⚠️ Potential issue_ | _🟠 Major_
   
   <details>
   <summary>🧩 Analysis chain</summary>
   
   🏁 Script executed:
   
   ```shell
   fd -t f "GlutenTestUtils.java"
   ```
   
   Repository: yihua/hudi
   
   Length of output: 144
   
   ---
   
   🏁 Script executed:
   
   ```shell
   cat -n 
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/testutils/GlutenTestUtils.java
   ```
   
   Repository: yihua/hudi
   
   Length of output: 3358
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Search for similar validation patterns in the codebase for system 
properties
   rg "System.getProperty" --type java -A 5 -B 2 | head -100
   ```
   
   Repository: yihua/hudi
   
   Length of output: 13994
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Check if there are any tests or usages of applyGlutenConf
   rg "applyGlutenConf" --type java -B 2 -A 5
   ```
   
   Repository: yihua/hudi
   
   Length of output: 3052
   
   ---
   
   
   
   </details>
   
   **Add fail-fast validation for `gluten.bundle.jar` property and fallback 
plugin class.**
   
   The method currently treats `gluten.bundle.jar` as a boolean flag only. A 
typo in the path or missing classpath wiring still configures SparkConf and 
defers the failure to Spark startup, making test diagnostics harder. Validate 
the file path and ensure the fallback `GlutenPlugin` class exists before 
enabling the plugin mode.
   
   <details>
   <summary>Suggested fail-fast check</summary>
   
   ```diff
   +import java.io.File;
   +
      public static void applyGlutenConf(SparkConf sparkConf) {
        String glutenBundleJar = System.getProperty("gluten.bundle.jar");
        if (glutenBundleJar == null || glutenBundleJar.isEmpty()) {
          return;
        }
   +    if (!new File(glutenBundleJar).isFile()) {
   +      throw new IllegalArgumentException("gluten.bundle.jar does not point 
to an existing file: " + glutenBundleJar);
   +    }
    
        String pluginClass;
        try {
          Class.forName("ai.onehouse.quanton.QuantonPlugin");
          pluginClass = "ai.onehouse.quanton.QuantonPlugin";
        } catch (ClassNotFoundException e) {
   -      pluginClass = "org.apache.gluten.GlutenPlugin";
   +      try {
   +        Class.forName("org.apache.gluten.GlutenPlugin");
   +        pluginClass = "org.apache.gluten.GlutenPlugin";
   +      } catch (ClassNotFoundException e2) {
   +        throw new IllegalStateException(
   +            "gluten.bundle.jar is set but neither Quanton nor Gluten plugin 
is on the test classpath",
   +            e2);
   +      }
        }
   ```
   
   </details>
   
   <details>
   <summary>🤖 Prompt for AI Agents</summary>
   
   ```
   Verify each finding against the current code and only fix it if needed.
   
   In
   
`@hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/testutils/GlutenTestUtils.java`
   around lines 42 - 54, The applyGlutenConf method currently treats the
   gluten.bundle.jar string as a flag; update it to fail-fast by validating the
   path and plugin class before mutating SparkConf: 1) check that the
   glutenBundleJar string is non-empty and points to an existing readable file
   (e.g., new File(glutenBundleJar).isFile()) and if not, throw an
   IllegalArgumentException or return early so SparkConf is not modified; 2)
   resolve the plugin class by first trying
   Class.forName("ai.onehouse.quanton.QuantonPlugin") and if that fails set
   pluginClass = "org.apache.gluten.GlutenPlugin", then also verify the fallback
   exists via Class.forName(pluginClass) and return/throw if it does not; 3) 
only
   after both the jar file and the chosen pluginClass are validated, set the
   SparkConf entries (the code that writes to sparkConf) so misconfigured paths 
or
   missing classes are detected immediately in applyGlutenConf.
   ```
   
   </details>
   
   <!-- 
fingerprinting:phantom:medusa:grasshopper:163599a9-90f2-46bb-9a83-b92fc3dd8804 
-->
   
   <!-- This is an auto-generated comment by CodeRabbit -->
   
   — *CodeRabbit* 
([original](https://github.com/yihua/hudi/pull/39#discussion_r3082895725)) 
(source:comment#3082895725)



##########
pom.xml:
##########
@@ -170,7 +170,7 @@
     <rocksdbjni.version>7.5.3</rocksdbjni.version>

Review Comment:
   <a href="#"><img alt="P1" 
src="https://greptile-static-assets.s3.amazonaws.com/badges/p1.svg?v=7"; 
align="top"></a> **Internal Onehouse Spark build version in OSS project**
   
   The Spark 3.5 version has been changed from the public `3.5.5` to 
`3.5.2-INTERNAL-0.16`, an Onehouse-internal build that is only resolvable via 
the private CodeArtifact repository added below (line ~1966). This will break 
the build for every developer and CI environment that does not have access to 
that repository.
   
   This must be reverted to an official Apache Spark release before merging to 
the OSS project.
   
   — *Greptile* 
([original](https://github.com/yihua/hudi/pull/39#discussion_r3082852612)) 
(source:comment#3082852612)



##########
hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/HoodieSparkUtils.scala:
##########
@@ -103,6 +103,7 @@ object HoodieSparkUtils extends SparkAdapterSupport with 
SparkVersionsSupport wi
     //       Additionally, we have to explicitly wrap around resulting [[RDD]] 
into the one
     //       injecting [[SQLConf]], which by default isn't propagated by Spark 
to the executor(s).
     //       [[SQLConf]] is required by [[AvroSerializer]]
+    logWarning(s"createRdd 
executedPlan:\n${df.queryExecution.executedPlan.treeString}")

Review Comment:
   <a href="#"><img alt="P1" 
src="https://greptile-static-assets.s3.amazonaws.com/badges/p1.svg?v=7"; 
align="top"></a> **Debug warning log left in production code**
   
   This `logWarning` call prints the entire executed query plan (as a tree 
string) at WARN level every time `createRdd` is called. This is clearly a debug 
statement added during the Gluten/Velox investigation and must not reach 
production:
   - It will spam the logs for every MOR read in every Hudi job.
   - `treeString` is non-trivial to compute and allocates for every partition 
scan.
   - WARN level means it will appear in most production logging configurations.
   
   This line should be removed entirely before merging.
   
   — *Greptile* 
([original](https://github.com/yihua/hudi/pull/39#discussion_r3082852557)) 
(source:comment#3082852557)



##########
hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/HoodieSparkUtils.scala:
##########
@@ -103,6 +103,7 @@ object HoodieSparkUtils extends SparkAdapterSupport with 
SparkVersionsSupport wi
     //       Additionally, we have to explicitly wrap around resulting [[RDD]] 
into the one
     //       injecting [[SQLConf]], which by default isn't propagated by Spark 
to the executor(s).
     //       [[SQLConf]] is required by [[AvroSerializer]]
+    logWarning(s"createRdd 
executedPlan:\n${df.queryExecution.executedPlan.treeString}")

Review Comment:
   _⚠️ Potential issue_ | _🟠 Major_
   
   **Avoid warning-level plan dumps on the `createRdd` hot path.**
   
   `createRdd` is on the write path, and `executedPlan.treeString` eagerly 
materializes the full plan string. Logging that at `WARN` for every call will 
add a lot of noise and overhead in production. Please keep this behind 
`logDebug` or a dedicated diagnostic flag instead.
   
   
   
   <details>
   <summary>Suggested change</summary>
   
   ```diff
   -    logWarning(s"createRdd 
executedPlan:\n${df.queryExecution.executedPlan.treeString}")
   +    logDebug(s"createRdd 
executedPlan:\n${df.queryExecution.executedPlan.treeString}")
   ```
   </details>
   
   <!-- suggestion_start -->
   
   <details>
   <summary>📝 Committable suggestion</summary>
   
   > ‼️ **IMPORTANT**
   > Carefully review the code before committing. Ensure that it accurately 
replaces the highlighted code, contains no missing lines, and has no issues 
with indentation. Thoroughly test & benchmark the code to ensure it meets the 
requirements.
   
   ```suggestion
       logDebug(s"createRdd 
executedPlan:\n${df.queryExecution.executedPlan.treeString}")
   ```
   
   </details>
   
   <!-- suggestion_end -->
   
   <details>
   <summary>🤖 Prompt for AI Agents</summary>
   
   ```
   Verify each finding against the current code and only fix it if needed.
   
   In
   
`@hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/HoodieSparkUtils.scala`
   at line 106, The current logWarning in HoodieSparkUtils.createRdd eagerly
   materializes df.queryExecution.executedPlan.treeString on a hot write path;
   change it to avoid WARN-level plan dumps by either moving the message to
   logDebug or gating it behind a diagnostic flag. Locate the logWarning call
   referencing df.queryExecution.executedPlan.treeString in createRdd and 
replace
   it so the plan string is only generated when logDebug is enabled or when a 
new
   boolean diagnostic flag (e.g., enablePlanDump) is true; ensure you check
   logger.isDebugEnabled (or the flag) before calling treeString to prevent
   unnecessary work.
   ```
   
   </details>
   
   <!-- 
fingerprinting:phantom:medusa:grasshopper:79e7aa6f-65c0-490f-bb56-4d8aab80feac 
-->
   
   <!-- This is an auto-generated comment by CodeRabbit -->
   
   — *CodeRabbit* 
([original](https://github.com/yihua/hudi/pull/39#discussion_r3082895711)) 
(source:comment#3082895711)



##########
RCA_Gluten_Velox_MOR_Failures.md:
##########
@@ -0,0 +1,373 @@
+# RCA: Gluten/Velox MOR Test Failures
+
+Three sub-tests fail when Gluten/Velox native execution is enabled:
+`testMORWithMap`, `testMORWithDecimal`, `testMORWithArray`.
+

Review Comment:
   <a href="#"><img alt="P2" 
src="https://greptile-static-assets.s3.amazonaws.com/badges/p2.svg?v=7"; 
align="top"></a> **Internal investigation document should not be committed to 
the repository**
   
   `RCA_Gluten_Velox_MOR_Failures.md` is a root-cause analysis document for a 
specific Gluten/Velox testing investigation. It references internal stack 
traces, log evidence, and Onehouse-specific infrastructure. It is not part of 
the project's documentation and will confuse future readers of the repository.
   
   This file should not be committed to the OSS repository. If it needs to be 
preserved for reference, it belongs in an internal wiki or issue tracker rather 
than the source tree.
   
   — *Greptile* 
([original](https://github.com/yihua/hudi/pull/39#discussion_r3082853179)) 
(source:comment#3082853179)



##########
hudi-common/src/test/java/org/apache/hudi/common/table/read/buffer/TestKeyBasedFileGroupRecordBuffer.java:
##########
@@ -176,6 +178,7 @@ void readWithEventTimeOrderingWithRecords() throws 
IOException {
     assertEquals(3, readStats.getNumUpdates());
   }
 
+  @Disabled("Custom delete payload not supported")
   @Test
   void readWithCommitTimeOrdering() throws IOException {

Review Comment:
   _⚠️ Potential issue_ | _🟡 Minor_
   
   **Disabled reason may be misleading for test name.**
   
   The test `readWithCommitTimeOrdering` is disabled with "Custom delete 
payload not supported", which is technically correct (the test uses custom 
delete markers), but the test name suggests it's primarily about commit time 
ordering behavior. Consider clarifying the disable reason or renaming the test 
if custom delete payload support is the actual blocker.
   
   <details>
   <summary>🤖 Prompt for AI Agents</summary>
   
   ```
   Verify each finding against the current code and only fix it if needed.
   
   In
   
`@hudi-common/src/test/java/org/apache/hudi/common/table/read/buffer/TestKeyBasedFileGroupRecordBuffer.java`
   around lines 181 - 183, The `@Disabled` reason on the test method
   readWithCommitTimeOrdering is misleading; update the disable annotation or 
the
   test name so the intent is clear: either change the `@Disabled` message to
   explicitly state "Uses custom delete payload (not supported)" or rename the
   method (e.g., to readWithCommitTimeOrderingAndCustomDeletePayload) to reflect
   that the failure is due to custom delete payloads, not commit-time ordering.
   Locate the test method readWithCommitTimeOrdering and apply one of these 
changes
   so the disable reason and test identifier align.
   ```
   
   </details>
   
   <!-- fingerprinting:phantom:medusa:ocelot -->
   
   <!-- This is an auto-generated comment by CodeRabbit -->
   
   — *CodeRabbit* 
([original](https://github.com/yihua/hudi/pull/39#discussion_r3082895734)) 
(source:comment#3082895734)



##########
pom.xml:
##########
@@ -1963,6 +1963,11 @@
       <id>confluent</id>
       <url>https://packages.confluent.io/maven/</url>
     </repository>
+    <repository>
+      <id>codeartifact</id>
+      <name>codeartifact</name>

Review Comment:
   <a href="#"><img alt="P1" 
src="https://greptile-static-assets.s3.amazonaws.com/badges/p1.svg?v=7"; 
align="top"></a> **Private internal Maven repository in OSS project**
   
   The repository at 
`https://onehouse-194159489498.d.codeartifact.us-west-2.amazonaws.com/maven/onehouse-internal/`
 is a private AWS CodeArtifact instance owned by Onehouse. Adding it to the 
root `pom.xml` means every Maven build of the project will attempt to resolve 
artifacts from a private, unauthenticated endpoint — this will fail for all 
external contributors and CI systems outside of Onehouse infrastructure.
   
   Both this repository entry and the internal Spark version 
(`3.5.2-INTERNAL-0.16`) that depends on it must be removed.
   
   — *Greptile* 
([original](https://github.com/yihua/hudi/pull/39#discussion_r3082852660)) 
(source:comment#3082852660)



##########
hudi-common/src/test/java/org/apache/hudi/common/table/read/TestHoodieFileGroupReaderBase.java:
##########
@@ -449,8 +449,7 @@ public void testReadFileGroupWithMultipleOrderingFields() 
throws Exception {
   private static Stream<Arguments> logFileOnlyCases() {
     return Stream.of(
         arguments(RecordMergeMode.COMMIT_TIME_ORDERING, "avro"),
-        arguments(RecordMergeMode.EVENT_TIME_ORDERING, "parquet"),
-        arguments(RecordMergeMode.CUSTOM, "avro"));
+        arguments(RecordMergeMode.EVENT_TIME_ORDERING, "avro"));

Review Comment:
   _⚠️ Potential issue_ | _🟡 Minor_
   
   **Log file only test cases reduced to avro format only.**
   
   `logFileOnlyCases()` now only tests `avro` log data block format, removing 
`parquet` format coverage. If parquet log blocks are supported in production, 
this reduces test coverage.
   
   <details>
   <summary>🤖 Prompt for AI Agents</summary>
   
   ```
   Verify each finding against the current code and only fix it if needed.
   
   In
   
`@hudi-common/src/test/java/org/apache/hudi/common/table/read/TestHoodieFileGroupReaderBase.java`
   around lines 451 - 452, The test parameterization was narrowed to only the
   "avro" log data block format, removing "parquet" coverage; update the 
parameter
   list used by the test (in TestHoodieFileGroupReaderBase, the method supplying
   arguments for logFileOnlyCases) to include both "avro" and "parquet" again
   (e.g., add corresponding arguments(RecordMergeMode.COMMIT_TIME_ORDERING,
   "parquet") and arguments(RecordMergeMode.EVENT_TIME_ORDERING, "parquet") so 
the
   test runs for both formats).
   ```
   
   </details>
   
   <!-- fingerprinting:phantom:medusa:ocelot -->
   
   <!-- This is an auto-generated comment by CodeRabbit -->
   
   — *CodeRabbit* 
([original](https://github.com/yihua/hudi/pull/39#discussion_r3082895741)) 
(source:comment#3082895741)



##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestMORDataSource.scala:
##########
@@ -1491,8 +1487,8 @@ class TestMORDataSource extends HoodieSparkClientTestBase 
with SparkDatasetMixin
 
     var (_, readOpts) = getWriterReaderOpts(readType)
     var (writeOpts, _) = getWriterReaderOpts(writeType)
-    readOpts = readOpts ++ 
Map(HoodieStorageConfig.LOGFILE_DATA_BLOCK_FORMAT.key -> "parquet")
-    writeOpts = writeOpts ++ 
Map(HoodieStorageConfig.LOGFILE_DATA_BLOCK_FORMAT.key -> "parquet")
+    readOpts = readOpts ++ 
Map(HoodieStorageConfig.LOGFILE_DATA_BLOCK_FORMAT.key -> "avro")
+    writeOpts = writeOpts ++ 
Map(HoodieStorageConfig.LOGFILE_DATA_BLOCK_FORMAT.key -> "avro")

Review Comment:
   _⚠️ Potential issue_ | _🟠 Major_
   
   **This test no longer exercises parquet log blocks.**
   
   `testRecordTypeCompatibilityWithParquetLog` now forces 
`LOGFILE_DATA_BLOCK_FORMAT=avro`, so it will stay green even if the parquet-log 
reader regresses. Keep this test on `parquet` or rename/split it.  
   
   
   <details>
   <summary>Minimal fix</summary>
   
   ```diff
   -    readOpts = readOpts ++ 
Map(HoodieStorageConfig.LOGFILE_DATA_BLOCK_FORMAT.key -> "avro")
   -    writeOpts = writeOpts ++ 
Map(HoodieStorageConfig.LOGFILE_DATA_BLOCK_FORMAT.key -> "avro")
   +    readOpts = readOpts ++ 
Map(HoodieStorageConfig.LOGFILE_DATA_BLOCK_FORMAT.key -> "parquet")
   +    writeOpts = writeOpts ++ 
Map(HoodieStorageConfig.LOGFILE_DATA_BLOCK_FORMAT.key -> "parquet")
   ```
   </details>
   
   <!-- suggestion_start -->
   
   <details>
   <summary>📝 Committable suggestion</summary>
   
   > ‼️ **IMPORTANT**
   > Carefully review the code before committing. Ensure that it accurately 
replaces the highlighted code, contains no missing lines, and has no issues 
with indentation. Thoroughly test & benchmark the code to ensure it meets the 
requirements.
   
   ```suggestion
       readOpts = readOpts ++ 
Map(HoodieStorageConfig.LOGFILE_DATA_BLOCK_FORMAT.key -> "parquet")
       writeOpts = writeOpts ++ 
Map(HoodieStorageConfig.LOGFILE_DATA_BLOCK_FORMAT.key -> "parquet")
   ```
   
   </details>
   
   <!-- suggestion_end -->
   
   <details>
   <summary>🤖 Prompt for AI Agents</summary>
   
   ```
   Verify each finding against the current code and only fix it if needed.
   
   In
   
`@hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestMORDataSource.scala`
   around lines 1490 - 1491, The test testRecordTypeCompatibilityWithParquetLog 
was
   changed to force HoodieStorageConfig.LOGFILE_DATA_BLOCK_FORMAT to "avro" (via
   readOpts and writeOpts), so it no longer validates parquet log-block 
behavior;
   revert or adjust these assignments so the test actually uses "parquet" (e.g.,
   set readOpts and writeOpts to
   Map(HoodieStorageConfig.LOGFILE_DATA_BLOCK_FORMAT.key -> "parquet") or remove
   the override), or alternatively split/rename the test into separate cases 
(one
   explicitly setting "avro" and another keeping "parquet") so the parquet-log
   reader remains covered.
   ```
   
   </details>
   
   <!-- 
fingerprinting:phantom:medusa:grasshopper:163599a9-90f2-46bb-9a83-b92fc3dd8804 
-->
   
   <!-- This is an auto-generated comment by CodeRabbit -->
   
   — *CodeRabbit* 
([original](https://github.com/yihua/hudi/pull/39#discussion_r3082895765)) 
(source:comment#3082895765)



##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/dml/others/TestMergeModeCommitTimeOrdering.scala:
##########
@@ -100,8 +102,11 @@ class TestMergeModeCommitTimeOrdering extends 
HoodieSparkSqlTestBase {
         // TODO(HUDI-8820): enable MDT after supporting MDT with table version 
6
         ("hoodie.metadata.enable" -> "false", tableVersion.toInt == 6)
       ) {
-        withRecordType()(withTempDir { tmp =>
+        withRecordType()({
           val tableName = generateTableName
+          val currTime = System.currentTimeMillis()

Review Comment:
   <a href="#"><img alt="P1" 
src="https://greptile-static-assets.s3.amazonaws.com/badges/p1.svg?v=7"; 
align="top"></a> **Hardcoded developer-local path and global mutable state**
   
   Two related problems here:
   
   1. **Hardcoded path**: 
`s"/home/ubuntu/ws3/hudi-rs/tables/table${TestMergeModeCommitTimeOrdering.cnt}_$currTime"`
 is a hardcoded path on a specific developer's machine. Tests will fail on any 
other CI or developer environment.
   
   2. **Global mutable `cnt` in companion object**: The companion object 
`TestMergeModeCommitTimeOrdering.cnt` is a JVM-singleton `var` incremented 
across test invocations. This is a form of shared mutable state that makes test 
execution order-dependent and could cause path collisions or unexpected 
behavior if tests are run in parallel.
   
   Both issues stem from the same root cause — the code was modified to write 
tables to a fixed developer path for manual inspection with hudi-rs. This logic 
should be reverted to use `withTempDir { tmp => ... }` as in the original code.
   
   The same pattern appears in the second test block in this file (~line 255).
   
   — *Greptile* 
([original](https://github.com/yihua/hudi/pull/39#discussion_r3082852816)) 
(source:comment#3082852816)



##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/dml/others/TestMORFileSliceLayouts.scala:
##########
@@ -0,0 +1,823 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.spark.sql.hudi.dml.others
+
+import org.apache.hudi.HoodieCLIUtils
+import org.apache.hudi.common.model.{FileSlice, HoodieLogFile}
+import org.apache.hudi.common.model.HoodieRecord.HoodieRecordType
+import org.apache.hudi.common.table.{HoodieTableMetaClient, 
TableSchemaResolver}
+import org.apache.hudi.common.table.log.HoodieLogFileReader
+import 
org.apache.hudi.common.table.log.block.HoodieLogBlock.{HeaderMetadataType, 
HoodieLogBlockType}
+import org.apache.hudi.common.testutils.HoodieTestUtils
+import org.apache.hudi.common.util.{Option => HOption}
+import org.apache.hudi.testutils.DataSourceTestUtils
+
+import org.apache.hadoop.fs.Path
+import org.apache.spark.sql.SaveMode
+import org.apache.spark.sql.hudi.common.HoodieSparkSqlTestBase
+import 
org.apache.spark.sql.hudi.common.HoodieSparkSqlTestBase.getMetaClientAndFileSystemView
+
+import java.util.Collections
+
+import scala.collection.JavaConverters._
+
+/**
+ * Creates 4 MOR tables with specific file slice layouts, validates each layout
+ * via the same HoodieTableFileSystemView + getLatestMergedFileSlicesBeforeOrOn
+ * code path that SELECT uses, then saves gold data alongside each table.
+ */
+class TestMORFileSliceLayouts extends HoodieSparkSqlTestBase {
+
+  val BASE_OUTPUT_PATH = "/home/ubuntu/ws3/hudi-rs/tables/fdss"
+
+  private def cleanPath(path: String): Unit = {
+    val hadoopPath = new Path(path)
+    val fs = hadoopPath.getFileSystem(spark.sessionState.newHadoopConf())
+    if (fs.exists(hadoopPath)) {
+      fs.delete(hadoopPath, true)
+    }

Review Comment:
   _⚠️ Potential issue_ | _🟠 Major_
   
   **Don't hard-code a workstation path into an automated test suite.**
   
   These tests assume `/home/ubuntu/...` exists and `cleanPath` recursively 
deletes under it. That will fail on CI/non-Ubuntu hosts and is unsafe on 
developer machines. Use a harness-provided temp directory instead.
   
   <details>
   <summary>🤖 Prompt for AI Agents</summary>
   
   ```
   Verify each finding against the current code and only fix it if needed.
   
   In
   
`@hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/dml/others/TestMORFileSliceLayouts.scala`
   around lines 48 - 55, Replace the hard-coded BASE_OUTPUT_PATH and the unsafe
   cleanPath usage with a test-scoped temporary directory: change 
BASE_OUTPUT_PATH
   to be created at runtime (e.g., use java.nio.file.Files.createTempDirectory 
or
   the test harness/tempDir utility) and ensure cleanPath operates on that temp 
dir
   only (or remove cleanPath and rely on the temp directory lifecycle). Update
   references to BASE_OUTPUT_PATH and the cleanPath helper so all tests use this
   generated path, and ensure the temp directory is deleted/cleaned in an
   afterEach/afterAll teardown to avoid leaving artifacts.
   ```
   
   </details>
   
   <!-- 
fingerprinting:phantom:medusa:grasshopper:163599a9-90f2-46bb-9a83-b92fc3dd8804 
-->
   
   <!-- This is an auto-generated comment by CodeRabbit -->
   
   — *CodeRabbit* 
([original](https://github.com/yihua/hudi/pull/39#discussion_r3082895785)) 
(source:comment#3082895785)



##########
hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/io/TestMergeHandle.java:
##########
@@ -277,7 +278,7 @@ void testWriteFailures() throws Exception {
   }
 
   @ParameterizedTest
-  @ValueSource(strings = {"EVENT_TIME_ORDERING", "COMMIT_TIME_ORDERING", 
"CUSTOM", "CUSTOM_MERGER"})
+  @ValueSource(strings = {"EVENT_TIME_ORDERING", "COMMIT_TIME_ORDERING"})
   public void testFGReaderBasedMergeHandleInsertUpsertDelete(String mergeMode) 
throws IOException {
     testFGReaderBasedMergeHandleInsertUpsertDeleteInternal(mergeMode, new 
Properties(), false);

Review Comment:
   _⚠️ Potential issue_ | _🟠 Major_
   
   **Keep explicit coverage for `CUSTOM` merge modes.**
   
   `testFGReaderBasedMergeHandleInsertUpsertDeleteInternal(...)` still has 
dedicated `CUSTOM` / `CUSTOM_MERGER` branches, so removing those values from 
`@ValueSource` just makes those paths untested. If they are unsupported only in 
the Gluten run, gate them there or replace them with an explicit unsupported 
assertion instead of dropping the coverage globally.
   
   <details>
   <summary>🤖 Prompt for AI Agents</summary>
   
   ```
   Verify each finding against the current code and only fix it if needed.
   
   In
   
`@hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/io/TestMergeHandle.java`
   around lines 280 - 283, The test removed CUSTOM/CUSTOM_MERGER branch 
coverage;
   restore explicit coverage by including "CUSTOM" and "CUSTOM_MERGER" in the
   `@ValueSource` for testFGReaderBasedMergeHandleInsertUpsertDelete or, if 
those
   modes are genuinely unsupported under the Gluten environment, add a guard 
inside
   testFGReaderBasedMergeHandleInsertUpsertDelete (calling
   testFGReaderBasedMergeHandleInsertUpsertDeleteInternal) that detects the 
Gluten
   run (e.g., an existing isGluten()/environment check) and asserts the 
unsupported
   behavior for CUSTOM/CUSTOM_MERGER instead of silently skipping them; update
   references to testFGReaderBasedMergeHandleInsertUpsertDeleteInternal, the
   ValueSource annotation, and any Gluten-run detection logic accordingly.
   ```
   
   </details>
   
   <!-- 
fingerprinting:phantom:medusa:grasshopper:79e7aa6f-65c0-490f-bb56-4d8aab80feac 
-->
   
   <!-- This is an auto-generated comment by CodeRabbit -->
   
   — *CodeRabbit* 
([original](https://github.com/yihua/hudi/pull/39#discussion_r3082895745)) 
(source:comment#3082895745)



##########
pom.xml:
##########
@@ -170,7 +170,7 @@
     <rocksdbjni.version>7.5.3</rocksdbjni.version>
     <spark33.version>3.3.4</spark33.version>
     <spark34.version>3.4.3</spark34.version>
-    <spark35.version>3.5.5</spark35.version>
+    <spark35.version>3.5.2-INTERNAL-0.16</spark35.version>

Review Comment:
   _⚠️ Potential issue_ | _🔴 Critical_
   
   <details>
   <summary>🧩 Analysis chain</summary>
   
   🏁 Script executed:
   
   ```shell
   # First, let me examine the pom.xml file structure and the specific lines 
mentioned
   wc -l pom.xml
   ```
   
   Repository: yihua/hudi
   
   Length of output: 66
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Check the area around line 173 for the spark35.version
   sed -n '170,180p' pom.xml
   ```
   
   Repository: yihua/hudi
   
   Length of output: 631
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Check the area around lines 1966-1970 for the Onehouse CodeArtifact 
repository
   sed -n '1960,1975p' pom.xml
   ```
   
   Repository: yihua/hudi
   
   Length of output: 479
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Search for profile definitions to verify which is the default profile
   rg -n '<profile>' pom.xml | head -20
   ```
   
   Repository: yihua/hudi
   
   Length of output: 433
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Look for activeByDefault or default profile settings
   rg -n -A 3 'activeByDefault\|<id>spark3' pom.xml
   ```
   
   Repository: yihua/hudi
   
   Length of output: 36
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Search for spark3.5 profile definition and its activation settings
   rg -n -A 15 '<id>spark3\.5' pom.xml
   ```
   
   Repository: yihua/hudi
   
   Length of output: 1058
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Look for all profiles with their IDs and activation settings
   rg -n -B 2 -A 8 '<profile>' pom.xml | grep -E 
'(profile>|<id>|activeByDefault|activation)'
   ```
   
   Repository: yihua/hudi
   
   Length of output: 2375
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Check line 1974 onwards to see the first profile definition after 
repositories
   sed -n '1974,2050p' pom.xml
   ```
   
   Repository: yihua/hudi
   
   Length of output: 2346
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Get the complete spark3.5 profile definition to check for activeByDefault
   sed -n '2656,2715p' pom.xml
   ```
   
   Repository: yihua/hudi
   
   Length of output: 3074
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Check if there are any activeByDefault declarations in profiles
   rg -n 'activeByDefault' pom.xml
   ```
   
   Repository: yihua/hudi
   
   Length of output: 159
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Check properties section to understand default spark module
   sed -n '140,200p' pom.xml
   ```
   
   Repository: yihua/hudi
   
   Length of output: 3710
   
   ---
   
   
   
   </details>
   
   **This makes the default Spark 3.5 build depend on private infrastructure.**
   
   `spark3.5` is the active-by-default profile, and the default properties use 
`${spark35.version}` which now points to `3.5.2-INTERNAL-0.16`. With the 
Onehouse CodeArtifact repository configured in the global repositories section, 
every default build will attempt to resolve this internal coordinate from 
private AWS infrastructure. Public CI and external contributors cannot access 
this repository without Onehouse credentials. Please keep the public Spark 
coordinate as default and move the internal version and repository behind an 
explicit opt-in profile.
   
   <details>
   <summary>🤖 Prompt for AI Agents</summary>
   
   ```
   Verify each finding against the current code and only fix it if needed.
   
   In `@pom.xml` at line 173, The pom currently sets the Maven property
   spark35.version to the internal coordinate "3.5.2-INTERNAL-0.16" which makes 
the
   active-by-default profile spark3.5 depend on private Onehouse CodeArtifact
   repos; revert the default <spark35.version> to the public Spark artifact 
(e.g.,
   a public 3.5.x coordinate) and move the internal "3.5.2-INTERNAL-0.16" value 
and
   the Onehouse repository declaration into a new opt-in Maven profile (e.g.,
   profile id "onehouse-internal" or similar); update references to 
spark35.version
   to remain unchanged so external builds use the public coordinate unless the
   opt-in profile is activated.
   ```
   
   </details>
   
   <!-- 
fingerprinting:phantom:medusa:grasshopper:79e7aa6f-65c0-490f-bb56-4d8aab80feac 
-->
   
   <!-- This is an auto-generated comment by CodeRabbit -->
   
   — *CodeRabbit* 
([original](https://github.com/yihua/hudi/pull/39#discussion_r3082895791)) 
(source:comment#3082895791)



##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/dml/others/TestMORFileSliceLayouts.scala:
##########
@@ -0,0 +1,823 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.spark.sql.hudi.dml.others
+
+import org.apache.hudi.HoodieCLIUtils
+import org.apache.hudi.common.model.{FileSlice, HoodieLogFile}
+import org.apache.hudi.common.model.HoodieRecord.HoodieRecordType
+import org.apache.hudi.common.table.{HoodieTableMetaClient, 
TableSchemaResolver}
+import org.apache.hudi.common.table.log.HoodieLogFileReader
+import 
org.apache.hudi.common.table.log.block.HoodieLogBlock.{HeaderMetadataType, 
HoodieLogBlockType}
+import org.apache.hudi.common.testutils.HoodieTestUtils
+import org.apache.hudi.common.util.{Option => HOption}
+import org.apache.hudi.testutils.DataSourceTestUtils
+
+import org.apache.hadoop.fs.Path
+import org.apache.spark.sql.SaveMode
+import org.apache.spark.sql.hudi.common.HoodieSparkSqlTestBase
+import 
org.apache.spark.sql.hudi.common.HoodieSparkSqlTestBase.getMetaClientAndFileSystemView
+
+import java.util.Collections
+
+import scala.collection.JavaConverters._
+
+/**
+ * Creates 4 MOR tables with specific file slice layouts, validates each layout
+ * via the same HoodieTableFileSystemView + getLatestMergedFileSlicesBeforeOrOn
+ * code path that SELECT uses, then saves gold data alongside each table.
+ */
+class TestMORFileSliceLayouts extends HoodieSparkSqlTestBase {
+
+  val BASE_OUTPUT_PATH = "/home/ubuntu/ws3/hudi-rs/tables/fdss"
+
+  private def cleanPath(path: String): Unit = {
+    val hadoopPath = new Path(path)
+    val fs = hadoopPath.getFileSystem(spark.sessionState.newHadoopConf())
+    if (fs.exists(hadoopPath)) {
+      fs.delete(hadoopPath, true)
+    }
+  }
+
+  // ─── Core validation helper ───
+  // Returns the file slices exactly as the SELECT query path would see them:
+  //   HoodieTableFileSystemView → 
getLatestMergedFileSlicesBeforeOrOn(partition, queryInstant)
+  // This mirrors BaseHoodieTableFileIndex.filterFiles (line 342-355).
+  private def getQueryFileSlices(basePath: String): Seq[FileSlice] = {
+    val (metaClient, fsView) = getMetaClientAndFileSystemView(basePath)
+    val queryInstant = metaClient.getActiveTimeline
+      .filterCompletedInstants().lastInstant().get().requestedTime()
+    fsView.getLatestMergedFileSlicesBeforeOrOn("", queryInstant)
+      .iterator().asScala.toSeq
+  }

Review Comment:
   _⚠️ Potential issue_ | _🟠 Major_
   
   **Close the file-system view and log readers on every exit path.**
   
   `getMetaClientAndFileSystemView` creates a `HoodieTableFileSystemView` that 
is never closed, and each `HoodieLogFileReader` only closes on the happy path. 
A failing read/assertion here will leak handles across this suite.  
   
   
   
   Also applies to: 111-131
   
   <details>
   <summary>🤖 Prompt for AI Agents</summary>
   
   ```
   Verify each finding against the current code and only fix it if needed.
   
   In
   
`@hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/dml/others/TestMORFileSliceLayouts.scala`
   around lines 62 - 68, getQueryFileSlices currently obtains resources from
   getMetaClientAndFileSystemView (which returns a HoodieTableFileSystemView) 
and
   iterates log readers but does not guarantee closure on exception paths; 
update
   getQueryFileSlices to ensure the HoodieTableFileSystemView and any
   HoodieLogFileReader instances are closed in all exit paths (use try/finally 
or a
   loan/withResource pattern) so that fsView.close() is always called and each 
log
   reader is closed even if an assertion or read fails; locate
   getMetaClientAndFileSystemView and any code that constructs 
HoodieLogFileReader
   (also in the 111-131 region) and wrap their usage in try/finally blocks or a
   managed-resource helper to release resources reliably.
   ```
   
   </details>
   
   <!-- 
fingerprinting:phantom:medusa:grasshopper:163599a9-90f2-46bb-9a83-b92fc3dd8804 
-->
   
   <!-- This is an auto-generated comment by CodeRabbit -->
   
   — *CodeRabbit* 
([original](https://github.com/yihua/hudi/pull/39#discussion_r3082895786)) 
(source:comment#3082895786)



##########
hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/table/functional/TestHoodieSparkMergeOnReadTableRollback.java:
##########
@@ -1157,6 +1158,7 @@ private HoodieWriteConfig getWriteConfig(boolean 
autoCommit, boolean rollbackUsi
   /**
    * Scenario: data table is updated, no changes to MDT
    */
+  @Disabled("Table version 6 only")

Review Comment:
   _⚠️ Potential issue_ | _🟠 Major_
   
   <details>
   <summary>🧩 Analysis chain</summary>
   
   🏁 Script executed:
   
   ```shell
   #!/bin/bash
   # Verify these v6 rollback-failure tests are currently hard-disabled in both 
rollback suites.
   rg -n --type=java '@Disabled\("Table version 6 
only"\)|testRollbackWithFailurePreMDT|testRollbackWithFailurePostMDT|testRollbackWithFailurePostMDTRollbackFailsPreMDT|testRollbackWithFailureInMDT'
 \
     
hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/table/functional/TestHoodieSparkMergeOnReadTableRollback.java
 \
     
hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/table/functional/TestHoodieSparkCopyOnWriteTableRollbackTableVersionSix.java
   ```
   
   Repository: yihua/hudi
   
   Length of output: 4310
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Check for existing `@Tag` usage in the test files to confirm pattern 
compatibility
   rg -n '@Tag\(' 
hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/table/functional/
 | head -20
   ```
   
   Repository: yihua/hudi
   
   Length of output: 969
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Check Maven pom.xml for test profiles or tag-based test execution 
configuration
   fd pom.xml | head -5 | xargs -I {} sh -c 'echo "=== {} ===" && grep -A 5 -B 
5 "tag\|profile\|maven-surefire" {} 2>/dev/null | head -30'
   ```
   
   Repository: yihua/hudi
   
   Length of output: 5745
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Look for any CI configuration files that might show test execution strategy
   fd -e '.yml' -e '.yaml' -e '.sh' | grep -i 'ci\|github\|jenkins\|test' | 
head -10
   ```
   
   Repository: yihua/hudi
   
   Length of output: 767
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Search for Maven Surefire plugin configuration in the main pom.xml
   find . -name "pom.xml" -path "*/hudi-spark-datasource/*" | head -1 | xargs 
grep -A 20 "maven-surefire-plugin" | head -40
   ```
   
   Repository: yihua/hudi
   
   Length of output: 36
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Look for test-related profiles in pom.xml files
   rg -A 10 '<profile>.*test|includedGroups|groupsToInclude' --type=xml | head 
-50
   ```
   
   Repository: yihua/hudi
   
   Length of output: 36
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Check for GitHub Actions workflow files
   fd -e '.yml' -e '.yaml' .github/workflows 2>/dev/null | head -10 | xargs cat
   ```
   
   Repository: yihua/hudi
   
   Length of output: 36
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Search for root pom.xml and check for test execution configuration
   cat pom.xml 2>/dev/null | grep -A 5 -B 5 "surefire\|maven-failsafe" || echo 
"No root pom.xml or no surefire config"
   ```
   
   Repository: yihua/hudi
   
   Length of output: 7972
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Check project structure to understand if Maven or Gradle
   ls -la | grep -E 'pom\.xml|build\.gradle|gradlew'
   ```
   
   Repository: yihua/hudi
   
   Length of output: 113
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Look for any CI configuration files more broadly
   find . -maxdepth 3 -type f \( -name '*.yml' -o -name '*.yaml' \) 2>/dev/null 
| head -20
   ```
   
   Repository: yihua/hudi
   
   Length of output: 941
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Check the GitHub Actions workflow for functional tests to see how they're 
executed
   cat .github/workflows/azure_ci_check.yml | head -100
   ```
   
   Repository: yihua/hudi
   
   Length of output: 3054
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Verify the complete Maven profile structure for tag-based filtering
   grep -A 25 '<profile>' pom.xml | grep -A 25 
'functional-b\|functional-c\|excludedGroups\|groups>'
   ```
   
   Repository: yihua/hudi
   
   Length of output: 3347
   
   ---
   
   
   
   </details>
   
   **Replace `@Disabled` with `@Tag` to restore conditional test execution.**
   
   These four rollback failure-path tests (lines 1161, 1170, 1180, 1189) are 
completely skipped in all runs due to hard disabling. The codebase already uses 
`@Tag` + Maven profiles (e.g., `functional-tests`, `functional-tests-b`, 
`functional-tests-c`) to organize version-specific tests. Apply the same 
pattern here:
   
   <details>
   <summary>Required changes</summary>
   
   1. Replace `@Disabled("Table version 6 only")` with 
`@Tag("table-version-6")` on all four methods in both 
`TestHoodieSparkMergeOnReadTableRollback.java` and 
`TestHoodieSparkCopyOnWriteTableRollbackTableVersionSix.java`.
   
   2. Add a Maven profile in `pom.xml` (or extend an existing functional-tests 
profile) to include the `table-version-6` tag:
   ```xml
   <profile>
     <id>functional-tests-v6</id>
     <configuration>
       <groups>table-version-6</groups>
     </configuration>
   </profile>
   ```
   
   3. Wire this profile into CI (Azure CI) to ensure these tests run in the 
appropriate validation suite.
   
   </details>
   
   <details>
   <summary>🤖 Prompt for AI Agents</summary>
   
   ```
   Verify each finding against the current code and only fix it if needed.
   
   In
   
`@hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/table/functional/TestHoodieSparkMergeOnReadTableRollback.java`
   at line 1161, Replace the hard-disabled annotations so the rollback 
failure-path
   tests run conditionally: in TestHoodieSparkMergeOnReadTableRollback.java and
   TestHoodieSparkCopyOnWriteTableRollbackTableVersionSix.java, change each 
method
   annotated with `@Disabled`("Table version 6 only") to 
`@Tag`("table-version-6") (the
   four rollback test methods around the existing markers). Then add/extend a 
Maven
   profile (e.g., functional-tests-v6) in pom.xml to include the table-version-6
   tag via the surefire/failsafe groups configuration, and ensure the Azure CI
   pipeline triggers that profile in the appropriate validation job so the
   table-version-6 tests execute in CI.
   ```
   
   </details>
   
   <!-- 
fingerprinting:phantom:poseidon:hawk:91643543-15be-490b-8f5c-8500eba38510 -->
   
   <!-- This is an auto-generated comment by CodeRabbit -->
   
   — *CodeRabbit* 
([original](https://github.com/yihua/hudi/pull/39#discussion_r3082895752)) 
(source:comment#3082895752)



##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/dml/others/TestMergeModeCommitTimeOrdering.scala:
##########
@@ -100,8 +102,11 @@ class TestMergeModeCommitTimeOrdering extends 
HoodieSparkSqlTestBase {
         // TODO(HUDI-8820): enable MDT after supporting MDT with table version 
6
         ("hoodie.metadata.enable" -> "false", tableVersion.toInt == 6)
       ) {
-        withRecordType()(withTempDir { tmp =>
+        withRecordType()({
           val tableName = generateTableName
+          val currTime = System.currentTimeMillis()
+          val tablePath = 
s"/home/ubuntu/ws3/hudi-rs/tables/table${TestMergeModeCommitTimeOrdering.cnt}_$currTime"
+          println(s"fdss: ${tablePath}")

Review Comment:
   _⚠️ Potential issue_ | _🔴 Critical_
   
   **Critical: Remove hardcoded absolute path and debug statement.**
   
   This hardcoded path (`/home/ubuntu/ws3/hudi-rs/tables/...`) will fail on any 
machine other than the original developer's. The `println(s"fdss: 
${tablePath}")` appears to be a debugging artifact.
   
   Use the test framework's temp directory mechanism (e.g., `withTempDir`) 
instead of hardcoded paths.
   
   
   <details>
   <summary>Suggested fix pattern</summary>
   
   ```diff
   -          val currTime = System.currentTimeMillis()
   -          val tablePath = 
s"/home/ubuntu/ws3/hudi-rs/tables/table${TestMergeModeCommitTimeOrdering.cnt}_$currTime"
   -          println(s"fdss: ${tablePath}")
   +          withTempDir { tmp =>
   +            val tablePath = tmp.getCanonicalPath
   ```
   
   And restore the original `withTempDir` pattern that was likely removed.
   </details>
   
   <details>
   <summary>🤖 Prompt for AI Agents</summary>
   
   ```
   Verify each finding against the current code and only fix it if needed.
   
   In
   
`@hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/dml/others/TestMergeModeCommitTimeOrdering.scala`
   around lines 107 - 109, The test currently uses a hardcoded absolute path via
   tablePath built from TestMergeModeCommitTimeOrdering.cnt and currTime and 
prints
   it with println (a debug artifact); replace this by using the test 
framework's
   temporary directory helper (e.g., withTempDir) to create tablePath inside the
   temp dir and remove the println line; locate the usage where
   currTime/tablePath/Println are set in TestMergeModeCommitTimeOrdering and
   restore the original withTempDir pattern so the path is portable and no debug
   output remains.
   ```
   
   </details>
   
   <!-- fingerprinting:phantom:medusa:ocelot -->
   
   <!-- This is an auto-generated comment by CodeRabbit -->
   
   — *CodeRabbit* 
([original](https://github.com/yihua/hudi/pull/39#discussion_r3082895779)) 
(source:comment#3082895779)



##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestMORDataSource.scala:
##########
@@ -583,6 +576,9 @@ class TestMORDataSource extends HoodieSparkClientTestBase 
with SparkDatasetMixin
     // First Operation:
     // Producing parquet files to three default partitions.
     // SNAPSHOT view on MOR table with parquet files only.

Review Comment:
   <a href="#"><img alt="P2" 
src="https://greptile-static-assets.s3.amazonaws.com/badges/p2.svg?v=7"; 
align="top"></a> **Leftover commented-out developer debug path**
   
   These two commented-out lines are clearly debugging artifacts from manual 
testing with hudi-rs:
   
   ```scala
   // val BASE_OUTPUT_PATH = 
"/home/ubuntu/ws3/hudi-rs/tables/testPrunedFiltered"
   // val tablePath = 
s"$BASE_OUTPUT_PATH/table_log_compaction_${System.currentTimeMillis()}"
   ```
   
   They should be removed entirely from the file before merging.
   
   — *Greptile* 
([original](https://github.com/yihua/hudi/pull/39#discussion_r3082852883)) 
(source:comment#3082852883)



##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/common/table/read/TestHoodieFileGroupReaderOnSpark.scala:
##########
@@ -226,6 +674,7 @@ class TestHoodieFileGroupReaderOnSpark extends 
TestHoodieFileGroupReaderBase[Int
     (10, "3", "rider-C", "driver-C", 33.9, "i"),
     (20, "1", "rider-Z", "driver-Z", 27.7, "i"))
 
+  @Disabled("Custom delete payload not supported")
   @ParameterizedTest

Review Comment:
   _⚠️ Potential issue_ | _🟠 Major_
   
   **Don't permanently disable custom-delete coverage.**
   
   Gluten is only enabled when `gluten.bundle.jar` is present, but this 
annotation removes the custom-delete path from normal Spark runs too. Skip it 
conditionally when native execution is active instead of disabling the test for 
every profile.  
   
   
   <details>
   <summary>Conditional-skip example</summary>
   
   ```diff
   -import org.junit.jupiter.api.{AfterEach, BeforeEach, Disabled, Test}
   +import org.junit.jupiter.api.{AfterEach, Assumptions, BeforeEach, Test}
    ...
   -  `@Disabled`("Custom delete payload not supported")
      `@ParameterizedTest`
      `@MethodSource`(Array("customDeleteTestParams"))
      def testCustomDelete(useFgReader: String,
                           tableType: String,
                           positionUsed: String,
                           mergeMode: String): Unit = {
   +    Assumptions.assumeTrue(
   +      System.getProperty("gluten.bundle.jar") == null,
   +      "Custom delete payload is not supported with Gluten/Velox")
   ```
   </details>
   
   <details>
   <summary>🤖 Prompt for AI Agents</summary>
   
   ```
   Verify each finding against the current code and only fix it if needed.
   
   In
   
`@hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/common/table/read/TestHoodieFileGroupReaderOnSpark.scala`
   around lines 677 - 678, The test is permanently disabled with 
`@Disabled`("Custom
   delete payload not supported"), removing custom-delete coverage for all runs;
   instead remove the `@Disabled` and add a conditional skip that only disables 
the
   test when native/Gluten execution is active (e.g., use JUnit Assumptions 
inside
   the test: call Assumptions.assumeFalse(isGlutenNativeExecutionActive()) or
   Assumptions.assumeTrue(!isGlutenNativeExecutionActive()), where
   isGlutenNativeExecutionActive() checks the presence of the gluten bundle
   (gluten.bundle.jar) or existing project helper), so the test in
   TestHoodieFileGroupReaderOnSpark runs normally under standard Spark profiles 
but
   is skipped only when native execution is enabled.
   ```
   
   </details>
   
   <!-- 
fingerprinting:phantom:medusa:grasshopper:163599a9-90f2-46bb-9a83-b92fc3dd8804 
-->
   
   <!-- This is an auto-generated comment by CodeRabbit -->
   
   — *CodeRabbit* 
([original](https://github.com/yihua/hudi/pull/39#discussion_r3082895760)) 
(source:comment#3082895760)



##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/dml/others/TestMORFileSliceLayouts.scala:
##########
@@ -0,0 +1,823 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.spark.sql.hudi.dml.others
+
+import org.apache.hudi.HoodieCLIUtils
+import org.apache.hudi.common.model.{FileSlice, HoodieLogFile}
+import org.apache.hudi.common.model.HoodieRecord.HoodieRecordType
+import org.apache.hudi.common.table.{HoodieTableMetaClient, 
TableSchemaResolver}
+import org.apache.hudi.common.table.log.HoodieLogFileReader
+import 
org.apache.hudi.common.table.log.block.HoodieLogBlock.{HeaderMetadataType, 
HoodieLogBlockType}
+import org.apache.hudi.common.testutils.HoodieTestUtils
+import org.apache.hudi.common.util.{Option => HOption}
+import org.apache.hudi.testutils.DataSourceTestUtils
+
+import org.apache.hadoop.fs.Path
+import org.apache.spark.sql.SaveMode
+import org.apache.spark.sql.hudi.common.HoodieSparkSqlTestBase
+import 
org.apache.spark.sql.hudi.common.HoodieSparkSqlTestBase.getMetaClientAndFileSystemView
+
+import java.util.Collections
+
+import scala.collection.JavaConverters._
+
+/**
+ * Creates 4 MOR tables with specific file slice layouts, validates each layout
+ * via the same HoodieTableFileSystemView + getLatestMergedFileSlicesBeforeOrOn
+ * code path that SELECT uses, then saves gold data alongside each table.
+ */
+class TestMORFileSliceLayouts extends HoodieSparkSqlTestBase {
+
+  val BASE_OUTPUT_PATH = "/home/ubuntu/ws3/hudi-rs/tables/fdss"
+
+  private def cleanPath(path: String): Unit = {
+    val hadoopPath = new Path(path)
+    val fs = hadoopPath.getFileSystem(spark.sessionState.newHadoopConf())
+    if (fs.exists(hadoopPath)) {
+      fs.delete(hadoopPath, true)

Review Comment:
   <a href="#"><img alt="P1" 
src="https://greptile-static-assets.s3.amazonaws.com/badges/p1.svg?v=7"; 
align="top"></a> **Hardcoded developer-local path used as base output 
directory**
   
   `val BASE_OUTPUT_PATH = "/home/ubuntu/ws3/hudi-rs/tables/fdss"` is a 
hardcoded absolute path on a specific developer's machine. All four tests in 
this file write their tables to sub-directories of this path. These tests will 
fail (or worse, silently write data to an unexpected location) on any other 
machine.
   
   This should be replaced with a temporary directory, consistent with how 
other test classes in the project handle isolation, e.g.:
   
   ```suggestion
     val BASE_OUTPUT_PATH = createTempDir().getAbsolutePath
   ```
   
   Or use the `withTempDir` pattern already used throughout 
`HoodieSparkSqlTestBase`.
   
   — *Greptile* 
([original](https://github.com/yihua/hudi/pull/39#discussion_r3082852745)) 
(source:comment#3082852745)



##########
RCA_Gluten_Velox_MOR_Failures.md:
##########
@@ -0,0 +1,373 @@
+# RCA: Gluten/Velox MOR Test Failures
+
+Three sub-tests fail when Gluten/Velox native execution is enabled:
+`testMORWithMap`, `testMORWithDecimal`, `testMORWithArray`.
+
+Each has a distinct root cause. This document traces execution with raw code 
and supporting log evidence.
+
+---
+
+## 1. `testMORWithMap` — `sizeInBytes (276) should be a multiple of 8`
+
+### Symptom
+
+```
+java.lang.AssertionError: sizeInBytes (276) should be a multiple of 8
+  at 
org.apache.spark.sql.catalyst.expressions.UnsafeRow.setTotalSize(UnsafeRow.java:172)
+  at 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter.getRow(UnsafeRowWriter.java:78)
+  at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source)
+  at 
org.apache.spark.sql.catalyst.expressions.codegen.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
+  at 
org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$...$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
+  at HoodieSparkUtils$.$anonfun$createRdd$2(HoodieSparkUtils.scala:107)
+```

Review Comment:
   _⚠️ Potential issue_ | _🟡 Minor_
   
   **Add languages to the fenced code blocks.**
   
   `markdownlint` is already flagging these fences, so this doc will keep 
warning until the blocks are tagged (`text`, `scala`, `java`, `cpp`, etc.).  
   
   
   
   Also applies to: 32-36, 82-85, 184-203, 206-210, 245-251, 254-264, 320-327, 
347-351
   
   <details>
   <summary>🧰 Tools</summary>
   
   <details>
   <summary>🪛 markdownlint-cli2 (0.22.0)</summary>
   
   [warning] 14-14: Fenced code blocks should have a language specified
   
   (MD040, fenced-code-language)
   
   </details>
   
   </details>
   
   <details>
   <summary>🤖 Prompt for AI Agents</summary>
   
   ````
   Verify each finding against the current code and only fix it if needed.
   
   In `@RCA_Gluten_Velox_MOR_Failures.md` around lines 14 - 22, Several fenced 
code
   blocks in the document are missing language specifiers; update each
   triple-backtick fence to include an appropriate language token (e.g., ```java
   for the Java stacktrace starting with java.lang.AssertionError and entries 
like
   UnsafeRow.setTotalSize, ```scala for Scala snippets referencing
   HoodieSparkUtils.createRdd, or ```text/```cpp as appropriate) so markdownlint
   stops flagging them; apply this change to the reported ranges (including the
   stacktrace block and the other blocks at 32-36, 82-85, 184-203, 206-210,
   245-251, 254-264, 320-327, 347-351) ensuring each fence uses the correct
   language tag that matches the contained code/sample.
   ````
   
   </details>
   
   <!-- 
fingerprinting:phantom:medusa:grasshopper:163599a9-90f2-46bb-9a83-b92fc3dd8804 
-->
   
   <!-- This is an auto-generated comment by CodeRabbit -->
   
   — *CodeRabbit* 
([original](https://github.com/yihua/hudi/pull/39#discussion_r3082895798)) 
(source:comment#3082895798)



##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/dml/others/TestMergeModeCommitTimeOrdering.scala:
##########
@@ -249,8 +255,11 @@ class TestMergeModeCommitTimeOrdering extends 
HoodieSparkSqlTestBase {
         // TODO(HUDI-8820): enable MDT after supporting MDT with table version 
6
         ("hoodie.metadata.enable" -> "false", tableVersion.toInt == 6)
       ) {
-        withRecordType()(withTempDir { tmp =>
+        withRecordType()({
           val tableName = generateTableName
+          val currTime = System.currentTimeMillis()
+          val tablePath = 
s"/home/ubuntu/ws3/hudi-rs/tables/table${TestMergeModeCommitTimeOrdering.cnt}_$currTime"
+          println(s"fdss: ${tablePath}")

Review Comment:
   _⚠️ Potential issue_ | _🔴 Critical_
   
   **Critical: Same hardcoded path and debug statement in second test.**
   
   This is a duplicate of the issue in the first test - hardcoded absolute path 
and debug println must be removed.
   
   <details>
   <summary>🤖 Prompt for AI Agents</summary>
   
   ```
   Verify each finding against the current code and only fix it if needed.
   
   In
   
`@hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/dml/others/TestMergeModeCommitTimeOrdering.scala`
   around lines 260 - 262, Remove the hardcoded absolute path and debug 
println: in
   the TestMergeModeCommitTimeOrdering test where tablePath is constructed using
   TestMergeModeCommitTimeOrdering.cnt and System.currentTimeMillis(), replace 
the
   absolute "/home/ubuntu/..." path with a temp or relative test directory (or 
use
   the test framework's temporary folder utility) and delete the println(s"fdss:
   ${tablePath}") call; ensure tablePath is built from a configurable base (or
   tempDir) so the test is not environment-specific.
   ```
   
   </details>
   
   <!-- fingerprinting:phantom:medusa:ocelot -->
   
   <!-- This is an auto-generated comment by CodeRabbit -->
   
   — *CodeRabbit* 
([original](https://github.com/yihua/hudi/pull/39#discussion_r3082895783)) 
(source:comment#3082895783)



##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestMORDataSource.scala:
##########
@@ -2259,6 +2256,59 @@ class TestMORDataSource extends 
HoodieSparkClientTestBase with SparkDatasetMixin
       .mode(SaveMode.Append)
       .save(basePath)
   }
+
+  /**
+   * Smoke test for Gluten/Velox native execution.
+   * Writes a v9 MOR table (Spark record type, parquet log blocks), reads it 
back,
+   * runs collect(), then prints the executedPlan so the caller can confirm 
that
+   * Gluten operators appear (e.g. VeloxColumnarToRow, ^(N) 
FileScanTransformer HudiFileGroup).
+   *
+   * Run with: mvn test -Dtest=TestMORDataSource#testGlutenVeloxNativeExecution
+   *                    
-Dgluten.bundle.jar=<path/to/gluten-velox-bundle-spark3.5*.jar>
+   */
+  @Test
+  def testGlutenVeloxNativeExecution(): Unit = {
+    val (writeOpts, readOpts) = getWriterReaderOpts(HoodieRecordType.SPARK)
+    val v9Opts = writeOpts ++ Map(
+      HoodieTableConfig.VERSION.key() -> 
HoodieTableVersion.NINE.versionCode().toString,
+      HoodieWriteConfig.WRITE_TABLE_VERSION.key() -> 
HoodieTableVersion.NINE.versionCode().toString
+    )
+
+    // Initial insert — 10 records across partitions
+    val records = recordsToStrings(dataGen.generateInserts("001", 
10)).asScala.toSeq
+    val inputDF = spark.read.json(spark.sparkContext.parallelize(records, 2))
+    inputDF.write.format("hudi")
+      .options(v9Opts)
+      .option(DataSourceWriteOptions.TABLE_TYPE.key, 
DataSourceWriteOptions.MOR_TABLE_TYPE_OPT_VAL)
+      .option(DataSourceWriteOptions.OPERATION.key, 
DataSourceWriteOptions.INSERT_OPERATION_OPT_VAL)
+      .option("hoodie.compact.inline", "false")
+      .mode(SaveMode.Overwrite)
+      .save(basePath)
+
+    // Upsert 5 records to create delta log files (MOR characteristic)
+    val updates = recordsToStrings(dataGen.generateUniqueUpdates("002", 
5)).asScala.toSeq
+    val updatesDF = spark.read.json(spark.sparkContext.parallelize(updates, 2))
+    updatesDF.write.format("hudi")
+      .options(v9Opts)
+      .option(DataSourceWriteOptions.OPERATION.key, 
DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL)
+      .mode(SaveMode.Append)
+      .save(basePath)
+
+    // Read back with snapshot query and project a few columns
+    val hudiDF = spark.read.format("hudi")
+      .options(readOpts)
+      .option(DataSourceReadOptions.QUERY_TYPE.key, 
DataSourceReadOptions.QUERY_TYPE_SNAPSHOT_OPT_VAL)
+      .load(basePath)
+    val df = hudiDF.select("driver", "rider", "fare")
+
+    val rows = df.collect()
+    println(s"[GLUTEN-TEST] Row count: ${rows.length}")
+    println("[GLUTEN-TEST] == Executed Physical Plan ==")
+    df.queryExecution.executedPlan.toString.split("\n")
+      .foreach(line => println(s"[GLUTEN-TEST] $line"))
+
+    assertTrue(rows.length > 0)

Review Comment:
   _⚠️ Potential issue_ | _🟠 Major_
   
   <details>
   <summary>🧩 Analysis chain</summary>
   
   🏁 Script executed:
   
   ```shell
   # First, let's locate the test file and verify its existence
   find . -name "TestMORDataSource.scala" -type f
   ```
   
   Repository: yihua/hudi
   
   Length of output: 154
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Get the line count to understand file size
   wc -l 
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestMORDataSource.scala
   ```
   
   Repository: yihua/hudi
   
   Length of output: 157
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Extract the specific test method around lines 2269-2310
   sed -n '2260,2320p' 
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestMORDataSource.scala
   ```
   
   Repository: yihua/hudi
   
   Length of output: 2841
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Search for other Gluten-related tests to understand pattern
   rg -i "gluten|velox" 
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestMORDataSource.scala
 -A 5 -B 2
   ```
   
   Repository: yihua/hudi
   
   Length of output: 1211
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Check for assumeTrue imports and patterns in the test file
   rg "import.*Assumptions|assumeTrue|skipTest|skipIf" 
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestMORDataSource.scala
   ```
   
   Repository: yihua/hudi
   
   Length of output: 36
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Look for gluten.bundle.jar usage in the codebase
   rg "gluten\.bundle\.jar|gluten\.bundle\.|GLUTEN" --type scala
   ```
   
   Repository: yihua/hudi
   
   Length of output: 707
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Check JUnit imports in the test file
   head -50 
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestMORDataSource.scala
 | grep -i "import.*junit"
   ```
   
   Repository: yihua/hudi
   
   Length of output: 122
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Get more imports from the test file
   head -100 
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestMORDataSource.scala
 | grep "^import"
   ```
   
   Repository: yihua/hudi
   
   Length of output: 2719
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Search for System.getProperty usage in the test file
   rg "System\.getProperty" 
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestMORDataSource.scala
   ```
   
   Repository: yihua/hudi
   
   Length of output: 121
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Search for System.getProperty usage patterns across test files
   rg "System\.getProperty" hudi-spark-datasource/hudi-spark/src/test/scala/ 
--type scala | head -20
   ```
   
   Repository: yihua/hudi
   
   Length of output: 1199
   
   ---
   
   🏁 Script executed:
   
   ```shell
   # Check for skipTest or assumption patterns in the project
   rg "assumeTrue|skipTest|skipIf" 
hudi-spark-datasource/hudi-spark/src/test/scala/ --type scala | head -15
   ```
   
   Repository: yihua/hudi
   
   Length of output: 358
   
   ---
   
   
   
   </details>
   
   **Assert native execution by checking the plan contains Gluten/Velox 
operators.**
   
   The test currently passes with plain Spark execution since it only checks 
`rows.length > 0`. Add `assumeTrue` to skip when the Gluten bundle is absent, 
and assert the executed plan contains expected native operators.
   
   <details>
   <summary>Suggested assertion</summary>
   
   ```diff
   +import org.junit.jupiter.api.Assumptions.assumeTrue
    ...
      `@Test`
      def testGlutenVeloxNativeExecution(): Unit = {
   +    assumeTrue(
   +      System.getProperty("gluten.bundle.jar") != null,
   +      "Gluten/Velox smoke test requires -Dgluten.bundle.jar")
   +
        val (writeOpts, readOpts) = getWriterReaderOpts(HoodieRecordType.SPARK)
        ...
   -    val rows = df.collect()
   -    println(s"[GLUTEN-TEST] Row count: ${rows.length}")
   -    println("[GLUTEN-TEST] == Executed Physical Plan ==")
   -    df.queryExecution.executedPlan.toString.split("\n")
   -      .foreach(line => println(s"[GLUTEN-TEST] $line"))
   -
   -    assertTrue(rows.length > 0)
   +    val rows = df.collect()
   +    val plan = df.queryExecution.executedPlan.treeString
   +    println(s"[GLUTEN-TEST] Row count: ${rows.length}")
   +    println(s"[GLUTEN-TEST] == Executed Physical Plan ==\n$plan")
   +
   +    assertTrue(rows.nonEmpty)
   +    assertTrue(
   +      plan.contains("Velox") || plan.contains("Gluten"),
   +      s"Expected Gluten/Velox operators in the executed plan, got:\n$plan")
      }
   ```
   </details>
   
   <details>
   <summary>🤖 Prompt for AI Agents</summary>
   
   ```
   Verify each finding against the current code and only fix it if needed.
   
   In
   
`@hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestMORDataSource.scala`
   around lines 2269 - 2310, The test testGlutenVeloxNativeExecution currently 
only
   checks rows.length > 0 so it passes without Gluten/Velox; update it to first
   assumeTrue that the Gluten native bundle is available (skip otherwise) and 
then
   assert that df.queryExecution.executedPlan.toString contains the expected 
native
   operator identifiers (e.g., "Velox" or "Gluten" or the specific operator 
names
   used by your runtime) after collecting/printing the plan; use the existing df
   and df.queryExecution.executedPlan references and keep the existing plan
   printing but replace or augment the final assertTrue(rows.length > 0) with an
   assumeTrue(...) then an assertion that the plan string contains the native
   operator marker.
   ```
   
   </details>
   
   <!-- 
fingerprinting:phantom:medusa:grasshopper:163599a9-90f2-46bb-9a83-b92fc3dd8804 
-->
   
   <!-- This is an auto-generated comment by CodeRabbit -->
   
   — *CodeRabbit* 
([original](https://github.com/yihua/hudi/pull/39#discussion_r3082895772)) 
(source:comment#3082895772)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to