[I] [VL] [BUG] Complex types already supported in Velox are considered not supported by Gluten [gluten]

via GitHub Wed, 11 Mar 2026 20:27:51 -0700


VvanFalleaves opened a new issue, #11746:
URL: https://github.com/apache/gluten/issues/11746


   ### Backend
   
   VL (Velox)
   
   ### Bug description
   
   The following SQL statement is used for the Hive write test.
   
   ```
   DROP TABLE IF EXISTS hive_support;
   CREATE TABLE IF NOT EXISTS hive_support (
     int_col INT,
     array_col ARRAY<INT>,
     map_col MAP<STRING, INT>,
     struct_col STRUCT<field1: INT, field2: STRING, field3: BOOLEAN>
   )
   PARTITIONED BY (part_col INT)
   STORED AS PARQUET
   TBLPROPERTIES (
     'parquet.compression' = 'ZSTD',
     'parquet.compression.zstd.level' = '5',
     'parquet.enable.dictionary' = 'false'
   );
   WITH number_seq AS (
     SELECT CAST(id AS INT) AS id_int
     FROM range(1, 50000000, 1, 100)
   )
   INSERT OVERWRITE TABLE hive_support
   PARTITION (part_col) 
   SELECT
     id_int AS int_col,
     ARRAY(
       id_int % 10,
       id_int % 100,
       id_int % 1000
     ) AS array_col,
     MAP(
       CONCAT('key_', CAST(id_int % 5 AS STRING)), id_int % 100,
       CONCAT('key_', CAST((id_int + 1) % 5 AS STRING)), (id_int + 1) % 100
     ) AS map_col,
     named_struct(
       'field1', id_int,
       'field2', CONCAT('str_', CAST(id_int AS STRING)),
       'field3', id_int % 2 = 0
     ) AS struct_col,
     id_int % 10 AS part_col
   FROM number_seq
   SORT BY id_int;
   ```
   
   The following error is reported during the execution:
   ```
   26/03/10 20:50:46 WARN GlutenFallbackReporter: Validation failed for plan: 
WriteFiles[QueryId=4], due to: Unsupported native write: Found unsupported 
type:ArrayType,MapType,StructType.
   ```
   
   The code in 
`backends-velox/src/main/scala/org/apache/gluten/backendsapi/velox/VeloxBackend.scala`
 is as follows:
   ```
       // Validate if all types are supported.
       def validateDataTypes(): Option[String] = {
         val unsupportedTypes = format match {
           case _: ParquetFileFormat =>
             fields.flatMap {
               case StructField(_, _: YearMonthIntervalType, _, _) =>
                 Some("YearMonthIntervalType")
               case StructField(_, _: StructType, _, _) =>
                 Some("StructType")
               case _ => None
             }
           case _ =>
             fields.flatMap {
               field =>
                 field.dataType match {
                   case _: StructType => Some("StructType")    // here 1
                   case _: ArrayType => Some("ArrayType")      // here 2
                   case _: MapType => Some("MapType")        // here 3
                   case _: YearMonthIntervalType => 
Some("YearMonthIntervalType")
                   case _ => None
                 }
             }
         }
         if (unsupportedTypes.nonEmpty) {
           Some(unsupportedTypes.mkString("Found unsupported type:", ",", ""))
         } else {
           None
         }
       }
   ```
   After the three lines of the related type are commented out, no rollback 
occurs and the data is successfully written.
   
   The Velox version downloaded by gluten in the `branch-1.3` is 
`https://github.com/oap-project/velox/tree/gluten-1.3.0`. In this case, 
`HiveDataSink::appendData` does not support `PARTITIONED BY` and complex types. 
However, the Velox version downloaded by gluten in the `main` branch is 
`https://github.com/IBM/velox/tree/dft-2026_03_10-iceberg`, where 
`HiveDataSink::appendData` method has been updated.
   
   I also want to confirm whether there are any other situations that have not 
been considered.
   
   ### Gluten version
   
   main branch
   
   ### Spark version
   
   Spark-3.4.x
   
   ### Spark configurations
   
   ```
     --master yarn
     --driver-cores 4
     --driver-memory 8g
     --num-executors 12
     --executor-cores 4
     --executor-memory 5g
     --conf spark.memory.offHeap.enabled=true
     --conf spark.memory.offHeap.size=20g
     --conf spark.executor.memoryOverhead=5g
     --conf spark.task.cpus=1
     --conf spark.executor.extraJavaOptions="-XX:+UseG1GC 
-XX:ActiveProcessorCount=4 -Dio.netty.tryReflectionSetAccessible=true"
     --conf 
spark.driver.extraJavaOptions="-Dio.netty.tryReflectionSetAccessible=true"
     --conf spark.locality.wait=0
     --conf spark.driver.extraClassPath="${JAR_PATH}"
     --conf spark.executor.extraClassPath="${JAR_PATH}"
     --conf 
spark.shuffle.manager=org.apache.spark.shuffle.sort.ColumnarShuffleManager
     --conf spark.plugins=org.apache.gluten.GlutenPlugin
     --conf spark.gluten.loadLibFromJar=false
     --conf spark.gluten.sql.columnar.backend.lib=velox
     --conf spark.gluten.sql.columnar.maxBatchSize=8192
     --conf spark.gluten.sql.orc.charType.scan.fallback.enabled=false
     --conf spark.gluten.sql.columnar.physicalJoinOptimizeEnable=true
     --conf spark.gluten.sql.columnar.physicalJoinOptimizationLevel=19
     --conf 
spark.gluten.sql.columnar.backend.velox.resizeBatches.shuffleInput=false
     --conf spark.sql.adaptive.coalescePartitions.initialPartitionNum=48
     --conf spark.default.parallelism=144
     --conf spark.sql.shuffle.partitions=144
     --conf spark.serializer=org.apache.spark.serializer.KryoSerializer
     --conf spark.sql.sources.parallelPartitionDiscovery.parallelism=60
     --conf spark.network.timeout=600
     --conf spark.sql.broadcastTimeout=600
     --conf spark.sql.adaptive.enabled=false
     --conf spark.sql.optimizer.runtime.bloomFilter.enabled=true
     --conf spark.sql.hive.convertMetastoreParquet=false
     --conf spark.sql.parquet.writeLegacyFormat=true
     --conf spark.sql.hive.manageFilesourcePartitions=false
     --conf spark.hadoop.hive.exec.dynamic.partition.mode=nonstrict
     --conf spark.hadoop.hive.exec.max.dynamic.partitions=1000
     --conf spark.hadoop.hive.exec.max.dynamic.partitions.pernode=1000
     --conf spark.sql.parquet.enableVectorizedWriter=false
     --conf spark.sql.parquet.enableVectorizedReader=false
     --conf spark.sql.parquet.enable.dictionary=false
     --conf spark.hadoop.parquet.enable.dictionary=false
     --conf spark.io.compression.codec=zstd
     --conf spark.sql.parquet.compression.codec=zstd
   ```
   
   ### System information
   
   Gluten Version: 1.7.0-SNAPSHOT
   Commit: 625a47611770dc7551c5d129e4138e9bff83d748
   CMake Version: 3.28.3
   System: Linux-5.10.0-182.0.0.95.oe2203sp3.aarch64
   Arch: aarch64
   CPU Name: BIOS Model name:                    Kunpeng 920
   C++ Compiler: /usr/bin/c++
   C++ Compiler Version: 12.5.0
   C Compiler: /usr/bin/cc
   C Compiler Version: 12.5.0
   CMake Prefix Path: 
/usr/local;/usr;/;/usr/local/lib64/python3.9/site-packages/cmake/data;/usr/local;/usr/X11R6;/usr/pkg;/opt
   
   
   ### Relevant logs
   
   ```bash
   26/03/10 20:50:46 WARN GlutenFallbackReporter: Validation failed for plan: 
WriteFiles[QueryId=4], due to: Unsupported native write: Found unsupported 
type:ArrayType,MapType.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] [VL] [BUG] Complex types already supported in Velox are considered not supported by Gluten [gluten]

Reply via email to