[
https://issues.apache.org/jira/browse/HUDI-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gary Li resolved HUDI-1662.
---------------------------
Resolution: Fixed
> Failed to query real-time view use hive/spark-sql when hudi mor table
> contains dateType
> ----------------------------------------------------------------------------------------
>
> Key: HUDI-1662
> URL: https://issues.apache.org/jira/browse/HUDI-1662
> Project: Apache Hudi
> Issue Type: Bug
> Components: Hive Integration
> Affects Versions: 0.7.0
> Environment: hive 3.1.1
> spark 2.4.5
> hadoop 3.1.1
> suse os
> Reporter: tao meng
> Priority: Major
> Labels: pull-request-available
> Original Estimate: 24h
> Remaining Estimate: 24h
>
> step1: prepare raw DataFrame with DateType, and insert it to HudiMorTable
> df_raw.withColumn("date", lit(Date.valueOf("2020-11-10")))
> merge(df_raw, "bulk_insert", "huditest.bulkinsert_mor_10g")
> step2: prepare update DataFrame with DateType, and upsert into HudiMorTable
> df_update = sql("select * from
> huditest.bulkinsert_mor_10g_rt").withColumn("date",
> lit(Date.valueOf("2020-11-11")))
> merge(df_update, "upsert", "huditest.bulkinsert_mor_10g")
>
> step3: use hive-beeeline/ spark-sql query mor_rt table
> use beeline/spark-sql execute statement select * from
> huditest.bulkinsert_mor_10g_rt where primary_key = 10000000;
> then the follow error will occur:
> _java.lang.ClassCastExceoption: org.apache.hadoop.io.IntWritable cannot be
> cast to org.apache.hadoop.hive.serde2.io.DateWritableV2_
>
>
> Root cause analysis:
> hudi use avro format to store log file, avro store DateType as INT(Type is
> INT but logcialType is date)。
> when hudi read log file and convert avro INT type record to
> writable,logicalType is not respected which lead the dateType will cast to
> IntWritable。
> seem:
> [https://github.com/apache/hudi/blob/master/hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieRealtimeRecordReaderUtils.java#L169]
>
> Modification plan: when cast avro INT type to writable, logicalType must
> be considerd
> case INT:
> if (schema.getLogicalType() != null &&
> schema.getLogicalType().getName().equals("date")) {
> return new DateWritable((Integer) value);
> } else {
> return new IntWritable((Integer) value);
> }
>
>
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)