forest455 opened a new issue #4200:
URL: https://github.com/apache/hudi/issues/4200
hoodieDeltaStreamer config part :
hoodie.datasource.write.recordkey.field=seq_no
hoodie.datasource.write.partitionpath.field=tran_date
hoodie.datasource.write.precombine.field=
hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.TimestampBasedKeyGenerator
hoodie.datasource.hive_sync.database=ods
hoodie.datasource.hive_sync.table=hudi_ods_tran
hoodie.datasource.hive_sync.partition_fields=tran_date_str
hoodie.deltastreamer.keygen.timebased.timestamp.type=SCALAR
hoodie.deltastreamer.keygen.timebased.timestamp.scalar.time.unit=days
hoodie.deltastreamer.keygen.timebased.output.dateformat=yyyy-MM-dd
hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.MultiPartKeysValueExtractor
after deltaStreamer finished , hudi table metadata synced to hive \
,you can see from excerpt below that partition field 'tran_date_str' is
absolutely right.
CREATE EXTERNAL TABLE `hudi_ods_tran`(
...
`_hoodie_is_deleted` boolean)
PARTITIONED BY (
`tran_date_str` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
WITH SERDEPROPERTIES
...
But when I try to use spark-sql to query this table. I use code like below:
val spark =
SparkSession.builder.config(jssc.getConf).config("spark.sql.catalogImplementation","hive").enableHiveSupport().getOrCreate;
spark.sql("select count(*) from ods.hudi_ods_tran").show()
I get error like this:
...
Exception in thread "main"
org.sparkproject.guava.util.concurrent.UncheckedExecutionException:
java.lang.RuntimeException: Failed to cast value `2021-01-04` to `IntegerType`
for partition column `tran_date`
at org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2263)
at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4000)
at
org.sparkproject.guava.cache.LocalCache$LocalManualCache.get(LocalCache.java:4789)
at
org.apache.spark.sql.catalyst.catalog.SessionCatalog.getCachedPlan(SessionCatalog.scala:155)
at
org.apache.spark.sql.execution.datasources.FindDataSourceTable.org$apache$spark$sql$execution$datasources$FindDataSourceTable$$readDataSourceTable(DataSourceStrategy.scala:249)
at
org.apache.spark.sql.execution.datasources.FindDataSourceTable$$anonfun$apply$2.applyOrElse(DataSourceStrategy.scala:288)
at
org.apache.spark.sql.execution.datasources.FindDataSourceTable$$anonfun$apply$2.applyOrElse(DataSourceStrategy.scala:278)
at
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDown$2(AnalysisHelper.scala:108)
at
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:74)
at
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDown$1(AnalysisHelper.scala:108)
at
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:221)
at
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown(AnalysisHelper.scala:106)
at
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDown$(AnalysisHelper.scala:104)
at org.apache.spark.sql.c
...
Caused by: java.lang.RuntimeException: Failed to cast value `2021-01-04` to
`IntegerType` for partition column `tran_date`
at
org.apache.spark.sql.execution.datasources.PartitioningUtils$.parsePartitionColumn(PartitioningUtils.scala:313)
at
org.apache.spark.sql.execution.datasources.PartitioningUtils$.parsePartition(PartitioningUtils.scala:251)
at
org.apache.spark.sql.execution.datasources.Spark3ParsePartitionUtil.parsePartition(Spark3ParsePartitionUtil.scala:39)
at
org.apache.hudi.HoodieFileIndex.$anonfun$getAllQueryPartitionPaths$3(HoodieFileIndex.scala:486)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at scala.collection.TraversableLike.map(TraversableLike.scala:238)
I have checked other tables without a partition field and it works.
I would be very appreciated if anyone can help to solve this prolem.
My env is spark3.1.2 hadoop3.3 hive 3.1.2
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]