vamshikrishnakyatham opened a new issue, #13942:
URL: https://github.com/apache/hudi/issues/13942
### Bug Description
**What happened:**
Reading a Hudi table with nested partition fields (e.g.,
_meta.partition.year: LongType) and hive-style partitioning fails with
ClassCastException: UTF8String cannot be cast to Long. The error occurs because
partition values from hive-style paths are stored as strings but the vectorized
reader expects typed values.
**What you expected:**
The query should execute successfully, automatically converting string
partition values from hive-style paths to their schema-defined types (e.g.,
"2022" → 2022L).
**Steps to reproduce:**
1. Create Hudi table with nested partition fields using non-string types:
Schema with _meta.partition.year as LongType
Write with "hoodie.datasource.write.hive_style_partitioning" ->
"true"
Partition fields:
"_meta.partition.year,_meta.partition.month,_meta.partition.day"
2. Write data to create hive-style directory structure:
table/_meta.partition.year=2022/_meta.partition.month=7/_meta.partition.day=5/file.parquet
3. Read the table with vectorized reading enabled:
spark.read.format("hudi").load(tablePath).collect() -> Throws:
ClassCastException at ColumnVectorUtils.populate()
### Environment
**Hudi version:** 1.1.0
**Query engine:** (Spark/Flink/Trino etc)
**Relevant configs:**
### Logs and Stack Trace
Caused by: java.lang.ClassCastException: class
org.apache.spark.unsafe.types.UTF8String cannot be cast to class java.lang.Long
(org.apache.spark.unsafe.types.UTF8String is in unnamed module of loader 'app';
java.lang.Long is in module java.base of loader 'bootstrap')
at scala.runtime.BoxesRunTime.unboxToLong(BoxesRunTime.java:107)
at
org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getLong(rows.scala:41)
at
org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getLong$(rows.scala:41)
at
org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getLong(rows.scala:165)
at
org.apache.spark.sql.execution.vectorized.ColumnVectorUtils.populate(ColumnVectorUtils.java:70)
at
org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.initBatch(VectorizedParquetRecordReader.java:293)
at
org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.initBatch(VectorizedParquetRecordReader.java:306)
at
org.apache.spark.sql.execution.datasources.parquet.Spark35ParquetReader.doRead(Spark35ParquetReader.scala:180)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]