VvanFalleaves opened a new issue, #11746:
URL: https://github.com/apache/gluten/issues/11746
### Backend
VL (Velox)
### Bug description
The following SQL statement is used for the Hive write test.
```
DROP TABLE IF EXISTS hive_support;
CREATE TABLE IF NOT EXISTS hive_support (
int_col INT,
array_col ARRAY<INT>,
map_col MAP<STRING, INT>,
struct_col STRUCT<field1: INT, field2: STRING, field3: BOOLEAN>
)
PARTITIONED BY (part_col INT)
STORED AS PARQUET
TBLPROPERTIES (
'parquet.compression' = 'ZSTD',
'parquet.compression.zstd.level' = '5',
'parquet.enable.dictionary' = 'false'
);
WITH number_seq AS (
SELECT CAST(id AS INT) AS id_int
FROM range(1, 50000000, 1, 100)
)
INSERT OVERWRITE TABLE hive_support
PARTITION (part_col)
SELECT
id_int AS int_col,
ARRAY(
id_int % 10,
id_int % 100,
id_int % 1000
) AS array_col,
MAP(
CONCAT('key_', CAST(id_int % 5 AS STRING)), id_int % 100,
CONCAT('key_', CAST((id_int + 1) % 5 AS STRING)), (id_int + 1) % 100
) AS map_col,
named_struct(
'field1', id_int,
'field2', CONCAT('str_', CAST(id_int AS STRING)),
'field3', id_int % 2 = 0
) AS struct_col,
id_int % 10 AS part_col
FROM number_seq
SORT BY id_int;
```
The following error is reported during the execution:
```
26/03/10 20:50:46 WARN GlutenFallbackReporter: Validation failed for plan:
WriteFiles[QueryId=4], due to: Unsupported native write: Found unsupported
type:ArrayType,MapType,StructType.
```
The code in
`backends-velox/src/main/scala/org/apache/gluten/backendsapi/velox/VeloxBackend.scala`
is as follows:
```
// Validate if all types are supported.
def validateDataTypes(): Option[String] = {
val unsupportedTypes = format match {
case _: ParquetFileFormat =>
fields.flatMap {
case StructField(_, _: YearMonthIntervalType, _, _) =>
Some("YearMonthIntervalType")
case StructField(_, _: StructType, _, _) =>
Some("StructType")
case _ => None
}
case _ =>
fields.flatMap {
field =>
field.dataType match {
case _: StructType => Some("StructType") // here 1
case _: ArrayType => Some("ArrayType") // here 2
case _: MapType => Some("MapType") // here 3
case _: YearMonthIntervalType =>
Some("YearMonthIntervalType")
case _ => None
}
}
}
if (unsupportedTypes.nonEmpty) {
Some(unsupportedTypes.mkString("Found unsupported type:", ",", ""))
} else {
None
}
}
```
After the three lines of the related type are commented out, no rollback
occurs and the data is successfully written.
The Velox version downloaded by gluten in the `branch-1.3` is
`https://github.com/oap-project/velox/tree/gluten-1.3.0`. In this case,
`HiveDataSink::appendData` does not support `PARTITIONED BY` and complex types.
However, the Velox version downloaded by gluten in the `main` branch is
`https://github.com/IBM/velox/tree/dft-2026_03_10-iceberg`, where
`HiveDataSink::appendData` method has been updated.
I also want to confirm whether there are any other situations that have not
been considered.
### Gluten version
main branch
### Spark version
Spark-3.4.x
### Spark configurations
```
--master yarn
--driver-cores 4
--driver-memory 8g
--num-executors 12
--executor-cores 4
--executor-memory 5g
--conf spark.memory.offHeap.enabled=true
--conf spark.memory.offHeap.size=20g
--conf spark.executor.memoryOverhead=5g
--conf spark.task.cpus=1
--conf spark.executor.extraJavaOptions="-XX:+UseG1GC
-XX:ActiveProcessorCount=4 -Dio.netty.tryReflectionSetAccessible=true"
--conf
spark.driver.extraJavaOptions="-Dio.netty.tryReflectionSetAccessible=true"
--conf spark.locality.wait=0
--conf spark.driver.extraClassPath="${JAR_PATH}"
--conf spark.executor.extraClassPath="${JAR_PATH}"
--conf
spark.shuffle.manager=org.apache.spark.shuffle.sort.ColumnarShuffleManager
--conf spark.plugins=org.apache.gluten.GlutenPlugin
--conf spark.gluten.loadLibFromJar=false
--conf spark.gluten.sql.columnar.backend.lib=velox
--conf spark.gluten.sql.columnar.maxBatchSize=8192
--conf spark.gluten.sql.orc.charType.scan.fallback.enabled=false
--conf spark.gluten.sql.columnar.physicalJoinOptimizeEnable=true
--conf spark.gluten.sql.columnar.physicalJoinOptimizationLevel=19
--conf
spark.gluten.sql.columnar.backend.velox.resizeBatches.shuffleInput=false
--conf spark.sql.adaptive.coalescePartitions.initialPartitionNum=48
--conf spark.default.parallelism=144
--conf spark.sql.shuffle.partitions=144
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer
--conf spark.sql.sources.parallelPartitionDiscovery.parallelism=60
--conf spark.network.timeout=600
--conf spark.sql.broadcastTimeout=600
--conf spark.sql.adaptive.enabled=false
--conf spark.sql.optimizer.runtime.bloomFilter.enabled=true
--conf spark.sql.hive.convertMetastoreParquet=false
--conf spark.sql.parquet.writeLegacyFormat=true
--conf spark.sql.hive.manageFilesourcePartitions=false
--conf spark.hadoop.hive.exec.dynamic.partition.mode=nonstrict
--conf spark.hadoop.hive.exec.max.dynamic.partitions=1000
--conf spark.hadoop.hive.exec.max.dynamic.partitions.pernode=1000
--conf spark.sql.parquet.enableVectorizedWriter=false
--conf spark.sql.parquet.enableVectorizedReader=false
--conf spark.sql.parquet.enable.dictionary=false
--conf spark.hadoop.parquet.enable.dictionary=false
--conf spark.io.compression.codec=zstd
--conf spark.sql.parquet.compression.codec=zstd
```
### System information
Gluten Version: 1.7.0-SNAPSHOT
Commit: 625a47611770dc7551c5d129e4138e9bff83d748
CMake Version: 3.28.3
System: Linux-5.10.0-182.0.0.95.oe2203sp3.aarch64
Arch: aarch64
CPU Name: BIOS Model name: Kunpeng 920
C++ Compiler: /usr/bin/c++
C++ Compiler Version: 12.5.0
C Compiler: /usr/bin/cc
C Compiler Version: 12.5.0
CMake Prefix Path:
/usr/local;/usr;/;/usr/local/lib64/python3.9/site-packages/cmake/data;/usr/local;/usr/X11R6;/usr/pkg;/opt
### Relevant logs
```bash
26/03/10 20:50:46 WARN GlutenFallbackReporter: Validation failed for plan:
WriteFiles[QueryId=4], due to: Unsupported native write: Found unsupported
type:ArrayType,MapType.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]