[I] Issue when reading Hudi tables with nested partition fields and hive-style partitioning [hudi]

via GitHub Sat, 20 Sep 2025 18:13:51 -0700


vamshikrishnakyatham opened a new issue, #13942:
URL: https://github.com/apache/hudi/issues/13942


   ### Bug Description
   
   **What happened:**
   Reading a Hudi table with nested partition fields (e.g., 
_meta.partition.year: LongType) and hive-style partitioning fails with 
ClassCastException: UTF8String cannot be cast to Long. The error occurs because 
partition values from hive-style paths are stored as strings but the vectorized 
reader expects typed values.
   
   **What you expected:**
    The query should execute successfully, automatically converting string 
partition values from hive-style paths to their schema-defined types (e.g., 
"2022" → 2022L).
   
   **Steps to reproduce:**
     1. Create Hudi table with nested partition fields using non-string types:
             Schema with _meta.partition.year as LongType
             Write with "hoodie.datasource.write.hive_style_partitioning" -> 
"true"
             Partition fields: 
"_meta.partition.year,_meta.partition.month,_meta.partition.day"
     2. Write data to create hive-style directory structure:
     
table/_meta.partition.year=2022/_meta.partition.month=7/_meta.partition.day=5/file.parquet
     3. Read the table with vectorized reading enabled:
     spark.read.format("hudi").load(tablePath).collect() -> Throws: 
ClassCastException at ColumnVectorUtils.populate()
   
   
   ### Environment
   
   **Hudi version:** 1.1.0
   **Query engine:** (Spark/Flink/Trino etc)
   **Relevant configs:**
   
   
   ### Logs and Stack Trace
   
   Caused by: java.lang.ClassCastException: class 
org.apache.spark.unsafe.types.UTF8String cannot be cast to class java.lang.Long 
(org.apache.spark.unsafe.types.UTF8String is in unnamed module of loader 'app'; 
java.lang.Long is in module java.base of loader 'bootstrap')
        at scala.runtime.BoxesRunTime.unboxToLong(BoxesRunTime.java:107)
        at 
org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getLong(rows.scala:41)
        at 
org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getLong$(rows.scala:41)
        at 
org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getLong(rows.scala:165)
        at 
org.apache.spark.sql.execution.vectorized.ColumnVectorUtils.populate(ColumnVectorUtils.java:70)
        at 
org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.initBatch(VectorizedParquetRecordReader.java:293)
        at 
org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.initBatch(VectorizedParquetRecordReader.java:306)
        at 
org.apache.spark.sql.execution.datasources.parquet.Spark35ParquetReader.doRead(Spark35ParquetReader.scala:180)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Issue when reading Hudi tables with nested partition fields and hive-style partitioning [hudi]

Reply via email to