vamshikrishnakyatham opened a new pull request, #13943:
URL: https://github.com/apache/hudi/pull/13943

   ### Describe the issue this Pull Request addresses
   
   This PR fixes a critical ClassCastException that occurs when reading Hudi 
tables with nested partition fields using hive-style partitioning. The issue 
manifests when partition fields are defined as non-string types (e.g., 
LongType) in nested structures but partition values from hive-style paths are 
incorrectly passed as UTF8String objects to the vectorized Parquet reader.
   
   ### Summary and Changelog
   
   Fixed ClassCastException in nested partition path schema handling by 
implementing proper type conversion for partition values before they are passed 
to the vectorized Parquet reader.
   
   Error Details:
     java.lang.ClassCastException: class 
org.apache.spark.unsafe.types.UTF8String cannot be cast to class java.lang.Long
         at scala.runtime.BoxesRunTime.unboxToLong(BoxesRunTime.java:107)
         at 
org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getLong(rows.scala:41)
         at 
org.apache.spark.sql.execution.vectorized.ColumnVectorUtils.populate(ColumnVectorUtils.java:70)
         at 
org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.initBatch(VectorizedParquetRecordReader.java:293)
   
     This occurs specifically with:
     - Nested partition fields (e.g., _meta.partition.year, 
_meta.partition.month, _meta.partition.day)
     - Hive-style partitioning enabled 
(hoodie.datasource.write.hive_style_partitioning=true)
     - Non-string data types in partition schema (especially LongType, 
IntegerType, etc.)
     - Vectorized reading enabled
   
   ### Impact
   
   User-Facing Impact:
     - Users can now successfully read Hudi tables with nested partition fields 
using hive-style partitioning without encountering ClassCastException
   
   Internal partition value handling is improved and No negative performance 
impact
   
   ### Risk Level
   
    None - This is a bug fix that doesn't introduce new features, 
configurations, or user-facing changes. The fix restores expected functionality 
for nested partition fields with hive-style partitioning.
   
   ### Documentation Update
   
   existing 1.1 documentation should address this already
   
   ### Contributor's checklist
   
   - [x] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [x] Enough context is provided in the sections above
   - [x] Adequate tests were added if applicable
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to