[
https://issues.apache.org/jira/browse/HUDI-6729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HUDI-6729:
---------------------------------
Labels: pull-request-available (was: )
> Fix get partition values from path for non-string type partition column
> -----------------------------------------------------------------------
>
> Key: HUDI-6729
> URL: https://issues.apache.org/jira/browse/HUDI-6729
> Project: Apache Hudi
> Issue Type: Bug
> Components: hudi-utilities
> Reporter: Wechar
> Priority: Major
> Labels: pull-request-available
>
> When we enable {{hoodie.datasource.read.extract.partition.values.from.path}}
> to get partition values from path instead of data file, the exception throw
> if partition column is not string type:
> {code:bash}
> Caused by: java.lang.ClassCastException:
> org.apache.spark.unsafe.types.UTF8String cannot be cast to java.lang.Integer
> at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:103)
> at
> org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getInt(rows.scala:41)
> at
> org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getInt$(rows.scala:41)
> at
> org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getInt(rows.scala:195)
> at
> org.apache.spark.sql.execution.vectorized.ColumnVectorUtils.populate(ColumnVectorUtils.java:97)
> at
> org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.initBatch(VectorizedParquetRecordReader.java:245)
> at
> org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.initBatch(VectorizedParquetRecordReader.java:264)
> at
> org.apache.spark.sql.execution.datasources.parquet.Spark32LegacyHoodieParquetFileFormat.$anonfun$buildReaderWithPartitionValues$2(Spark32LegacyHoodieParquetFileFormat.scala:314)
> at
> org.apache.hudi.HoodieDataSourceHelper$.$anonfun$buildHoodieParquetReader$1(HoodieDataSourceHelper.scala:67)
> at
> org.apache.hudi.HoodieBaseRelation.$anonfun$createBaseFileReader$2(HoodieBaseRelation.scala:602)
> at
> org.apache.hudi.HoodieBaseRelation$BaseFileReader.apply(HoodieBaseRelation.scala:680)
> at
> org.apache.hudi.HoodieBaseRelation$.$anonfun$projectReader$1(HoodieBaseRelation.scala:706)
> at
> org.apache.hudi.HoodieBaseRelation$.$anonfun$projectReader$2(HoodieBaseRelation.scala:711)
> at
> org.apache.hudi.HoodieBaseRelation$BaseFileReader.apply(HoodieBaseRelation.scala:680)
> at
> org.apache.hudi.HoodieMergeOnReadRDD.compute(HoodieMergeOnReadRDD.scala:96)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
> at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
> at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
> at org.apache.spark.scheduler.Task.run(Task.scala:131)
> at
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)