[GitHub] [hudi] praneethh opened a new issue, #7475: [SUPPORT]Exception when reading timestamp column from Hive

GitBox Thu, 15 Dec 2022 15:05:38 -0800


praneethh opened a new issue, #7475:
URL: https://github.com/apache/hudi/issues/7475


   Have created a Hudi table in hive and when reading the timestamp column from 
Hive getting the below exception
   `java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be 
cast to org.apache.hadoop.hive.serde2.io.TimestampWritableV2`
   
   There is no issue when reading from spark-shell. How to resolve the error 
when reading from Hive?
   
   Steps to reproduce the behavior:
   
   ```
   import java.sql.{Date, Timestamp}
   case class SimpleData(ts: Timestamp, name: String, email: String, 
src_recv_dt: Date, recvd_dt: Date)
   
   val df1 = List(SimpleData(Timestamp.valueOf("2022-12-02 09:47:00"), "Fake 
Name 5", "[email protected]", Date.valueOf("2022-12-02"), 
Date.valueOf("2022-12-03")),
        |     SimpleData(Timestamp.valueOf("2022-12-29 09:47:00"), "Fake Name 
4", "[email protected]", Date.valueOf("2022-12-29"), 
Date.valueOf("2022-12-03"))).toDF().as[SimpleData]
   
   
   df1.show
   +-------------------+-----------+-------------------+-----------+----------+
   |                 ts|       name|              email|src_recv_dt|  recvd_dt|
   +-------------------+-----------+-------------------+-----------+----------+
   |2022-12-02 09:47:00|Fake Name 5|[email protected]| 2022-12-02|2022-12-03|
   |2022-12-29 09:47:00|Fake Name 4|[email protected]| 2022-12-29|2022-12-03|
   +-------------------+-----------+-------------------+-----------+----------+
   
   df1.write.format("hudi").options(Map("hoodie.table.name"-> "rx",
        | "hoodie.datasource.write.recordkey.field"-> "name",
        | "hoodie.datasource.write.partitionpath.field"-> "recvd_dt",
        | "hoodie.datasource.write.operation"-> "upsert",
        | "hoodie.payload.ordering.field" -> "ts",
        | "hoodie.index.type"-> "GLOBAL_SIMPLE",
        | "hoodie.upsert.shuffle.parallelism"-> "1",
        | "hoodie.simple.index.update.partition.path"-> "false",
        | "hoodie.datasource.write.hive_style_partitioning" -> "true",
        | "hoodie.datasource.write.payload.class" -> 
"org.apache.hudi.common.model.DefaultHoodieRecordPayload",
        | "hoodie.datasource.hive_sync.database" -> "stg_ww", 
        | "hoodie.datasource.hive_sync.table"->"rx",
        | "hoodie.datasource.hive_sync.enable"->"true",
        | "hoodie.datasource.hive_sync.partition_fields"->"recvd_dt",
        | "hoodie.datasource.hive_sync.mode"->"hms",
        | "hoodie.datasource.hive_sync.use_jdbc"->"false",
        | "hoodie.datasource.write.precombine.field"->"ts",
        | "hoodie.schema.on.read.enable"->"true",
        | "hoodie.datasource.hive_sync.support_timestamp"->"true",
        | 
"hoodie.datasource.write.keygenerator.consistent.logical.timestamp.enabled"->"true")).mode("append").save("gs://....rx")
   
   
   hive> select ts from  stg_ww.rx;
   
   Caused by: java.lang.ClassCastException: org.apache.hadoop.io.LongWritable 
cannot be cast to org.apache.hadoop.hive.serde2.io.TimestampWritableV2
        at 
org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableTimestampObjectInspector.getPrimitiveWritableObject(WritableTimestampObjectInspector.java:34)
        at 
org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitiveUTF8(LazyUtils.java:308)
        at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:292)
        at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:247)
        at 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.doSerialize(LazySimpleSerDe.java:231)
        at 
org.apache.hadoop.hive.serde2.AbstractEncodingAwareSerDe.serialize(AbstractEncodingAwareSerDe.java:55)
        at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:951)
        at 
org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:995)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:941)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:928)
        at 
org.apache.hadoop.hive.ql.exec.LimitOperator.process(LimitOperator.java:63)
        at 
org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:995)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:941)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:928)
        at 
org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
        at 
org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:995)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:941)
        at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:125)
        at 
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:153)
        at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:555)
   ```
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   The query should return timestamp values
   
   **Environment Description**
   
   * Hudi version : 0.12.0
   
   * Spark version : 3.1.3
   
   * Hive version : 3.1.2
   
   * Hadoop version :  3.2.3
   
   * Storage (HDFS/S3/GCS..) : GCS
   
   * Running on Docker? (yes/no) : No
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] praneethh opened a new issue, #7475: [SUPPORT]Exception when reading timestamp column from Hive

Reply via email to