[
https://issues.apache.org/jira/browse/HUDI-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17343526#comment-17343526
]
Eldhose Paul commented on HUDI-1894:
------------------------------------
[~vinoth] I know its too early to ask, will this be added as part of next
release? We have some impacts using read optimized in prod, as data will be not
fresh as the snapshot one.
> NULL values in timestamp column defaulted
> ------------------------------------------
>
> Key: HUDI-1894
> URL: https://issues.apache.org/jira/browse/HUDI-1894
> Project: Apache Hudi
> Issue Type: Bug
> Components: Spark Integration
> Reporter: Eldhose Paul
> Priority: Major
>
> Reading timestamp column from hudi and underlying parquet files in spark
> gives different results.
> *hudi properties:*
> {code:java}
> hdfs dfs -cat
> /user/hive/warehouse/jira_expl.db/jiraissue_events/.hoodie/hoodie.properties
> #Properties saved on Tue May 11 17:17:22 EDT 2021
> #Tue May 11 17:17:22 EDT 2021
> hoodie.compaction.payload.class=org.apache.hudi.common.model.OverwriteWithLatestAvroPayload
> hoodie.table.name=jiraissue
> hoodie.archivelog.folder=archived
> hoodie.table.type=MERGE_ON_READ
> hoodie.table.version=1
> hoodie.timeline.layout.version=1
> {code}
>
> *Reading directly from parquet using Spark:*
> {code:java}
> scala> val ji =
> spark.read.format("parquet").load("/user/hive/warehouse/jira_expl.db/jiraissue_events/*.parquet")
> ji: org.apache.spark.sql.DataFrame = [_hoodie_commit_time: string,
> _hoodie_commit_seqno: string ... 49 more fields]scala> ji.filter($"id" ===
> 1237858).withColumn("inputfile",
> input_file_name()).select($"_hoodie_commit_time", $"_hoodie_commit_seqno",
> $"_hoodie_record_key", $"_hoodie_partition_path",
> $"_hoodie_file_name",$"resolutiondate", $"archiveddate",
> $"inputfile").show(false)
> +-------------------+----------------------+------------------+----------------------+-----------------------------------------------------------------------+--------------+------------+------------------------------------------------------------------------------------------------------------------------------------------------+
> |_hoodie_commit_time|_hoodie_commit_seqno
> |_hoodie_record_key|_hoodie_partition_path|_hoodie_file_name
> |resolutiondate|archiveddate|inputfile
>
> |
> +-------------------+----------------------+------------------+----------------------+-----------------------------------------------------------------------+--------------+------------+------------------------------------------------------------------------------------------------------------------------------------------------+
> |20210511171722 |20210511171722_7_13718|1237858.0 |
>
> |832cf07f-637b-4a4c-ab08-6929554f003a-0_7-98-5106_20210511171722.parquet|null
> |null
> |hdfs://nameservice1/user/hive/warehouse/jira_expl.db/jiraissue_events/832cf07f-637b-4a4c-ab08-6929554f003a-0_7-98-5106_20210511171722.parquet
> |
> |20210511171722 |20210511171722_7_13718|1237858.0 |
>
> |832cf07f-637b-4a4c-ab08-6929554f003a-0_7-98-5106_20210511171722.parquet|null
> |null
> |hdfs://nameservice1/user/hive/warehouse/jira_expl.db/jiraissue_events/832cf07f-637b-4a4c-ab08-6929554f003a-0_8-1610-78711_20210511173615.parquet%7C
> +-------------------+----------------------+------------------+----------------------+-----------------------------------------------------------------------+--------------+------------+------------------------------------------------------------------------------------------------------------------------------------------------+
> {code}
> *Reading `hudi` using Spark:*
> {code:java}
> scala> val jih =
> spark.read.format("org.apache.hudi").load("/user/hive/warehouse/jira_expl.db/jiraissue_events")
> jih: org.apache.spark.sql.DataFrame = [_hoodie_commit_time: string,
> _hoodie_commit_seqno: string ... 49 more fields]scala> jih.filter($"id" ===
> 1237858).select($"_hoodie_commit_time", $"_hoodie_commit_seqno",
> $"_hoodie_record_key", $"_hoodie_partition_path",
> $"_hoodie_file_name",$"resolutiondate", $"archiveddate").show(false)
> +-------------------+----------------------+------------------+----------------------+-----------------------------------------------------------------------+-------------------+-------------------+
> |_hoodie_commit_time|_hoodie_commit_seqno
> |_hoodie_record_key|_hoodie_partition_path|_hoodie_file_name
> |resolutiondate |archiveddate |
> +-------------------+----------------------+------------------+----------------------+-----------------------------------------------------------------------+-------------------+-------------------+
> |20210511171722 |20210511171722_7_13718|1237858.0 |
>
> |832cf07f-637b-4a4c-ab08-6929554f003a-0_7-98-5106_20210511171722.parquet|2018-07-30
> 14:58:52|1969-12-31 19:00:00|
> +-------------------+----------------------+------------------+----------------------+-----------------------------------------------------------------------+-------------------+-------------------+
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)