Alexey Kudinkin created HUDI-4818:
-------------------------------------

             Summary: Using CustomKeyGenerator fails w/ 
SparkHoodieTableFileIndex
                 Key: HUDI-4818
                 URL: https://issues.apache.org/jira/browse/HUDI-4818
             Project: Apache Hudi
          Issue Type: Bug
            Reporter: Alexey Kudinkin
            Assignee: Alexey Kudinkin
             Fix For: 0.13.0


Currently using `CustomKeyGenerator` with the partition-path config 
\{hoodie.datasource.write.partitionpath.field=ts:timestamp} fails w/
{code:java}
Caused by: java.lang.RuntimeException: Failed to cast value `2022-05-11` to 
`LongType` for partition column `ts_ms`
        at 
org.apache.spark.sql.execution.datasources.Spark3ParsePartitionUtil.$anonfun$parsePartition$2(Spark3ParsePartitionUtil.scala:72)
        at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
        at 
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
        at 
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
        at scala.collection.TraversableLike.map(TraversableLike.scala:286)
        at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
        at scala.collection.AbstractTraversable.map(Traversable.scala:108)
        at 
org.apache.spark.sql.execution.datasources.Spark3ParsePartitionUtil.$anonfun$parsePartition$1(Spark3ParsePartitionUtil.scala:65)
        at scala.Option.map(Option.scala:230)
        at 
org.apache.spark.sql.execution.datasources.Spark3ParsePartitionUtil.parsePartition(Spark3ParsePartitionUtil.scala:63)
        at 
org.apache.hudi.SparkHoodieTableFileIndex.parsePartitionPath(SparkHoodieTableFileIndex.scala:274)
        at 
org.apache.hudi.SparkHoodieTableFileIndex.parsePartitionColumnValues(SparkHoodieTableFileIndex.scala:258)
        at 
org.apache.hudi.BaseHoodieTableFileIndex.lambda$getAllQueryPartitionPaths$3(BaseHoodieTableFileIndex.java:190)
        at 
java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
        at 
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384)
        at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
        at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
        at 
java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
        at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
        at 
java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566)
        at 
org.apache.hudi.BaseHoodieTableFileIndex.getAllQueryPartitionPaths(BaseHoodieTableFileIndex.java:193)
 {code}
 

This occurs b/c SparkHoodieTableFileIndex produces incorrect partition schema 
at XXX

where it properly handles only `TimestampBasedKeyGenerator`s but not the other 
key-generators that might be changing the data-type of the partition-value as 
compared to the source partition-column (in this case it has `ts` as a long in 
the source table schema, but it produces partition-value as string)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to