vamshikrishnakyatham opened a new pull request, #13943:
URL: https://github.com/apache/hudi/pull/13943
### Describe the issue this Pull Request addresses
This PR fixes a critical ClassCastException that occurs when reading Hudi
tables with nested partition fields using hive-style partitioning. The issue
manifests when partition fields are defined as non-string types (e.g.,
LongType) in nested structures but partition values from hive-style paths are
incorrectly passed as UTF8String objects to the vectorized Parquet reader.
### Summary and Changelog
Fixed ClassCastException in nested partition path schema handling by
implementing proper type conversion for partition values before they are passed
to the vectorized Parquet reader.
Error Details:
java.lang.ClassCastException: class
org.apache.spark.unsafe.types.UTF8String cannot be cast to class java.lang.Long
at scala.runtime.BoxesRunTime.unboxToLong(BoxesRunTime.java:107)
at
org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getLong(rows.scala:41)
at
org.apache.spark.sql.execution.vectorized.ColumnVectorUtils.populate(ColumnVectorUtils.java:70)
at
org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.initBatch(VectorizedParquetRecordReader.java:293)
This occurs specifically with:
- Nested partition fields (e.g., _meta.partition.year,
_meta.partition.month, _meta.partition.day)
- Hive-style partitioning enabled
(hoodie.datasource.write.hive_style_partitioning=true)
- Non-string data types in partition schema (especially LongType,
IntegerType, etc.)
- Vectorized reading enabled
### Impact
User-Facing Impact:
- Users can now successfully read Hudi tables with nested partition fields
using hive-style partitioning without encountering ClassCastException
Internal partition value handling is improved and No negative performance
impact
### Risk Level
None - This is a bug fix that doesn't introduce new features,
configurations, or user-facing changes. The fix restores expected functionality
for nested partition fields with hive-style partitioning.
### Documentation Update
existing 1.1 documentation should address this already
### Contributor's checklist
- [x] Read through [contributor's
guide](https://hudi.apache.org/contribute/how-to-contribute)
- [x] Enough context is provided in the sections above
- [x] Adequate tests were added if applicable
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]