[ 
https://issues.apache.org/jira/browse/HUDI-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17343876#comment-17343876
 ] 

sivabalan narayanan edited comment on HUDI-1894 at 5/13/21, 1:56 PM:
---------------------------------------------------------------------

[~epaul]: If you can give us the steps to reproduce, would be easier for us 
work on the fix. 

Also, can you confirm the following
 * you don't see any issues w/ COW 
 * you don't see issues w/ MOR when using read optimized. (guess you have 
already confirmed this). 

Not sure if you are in POC phase or in prod. but one option to unblock you for 
now, is to trigger compaction on every commit. I know you might as well move to 
COW instead of MOR instead of triggering compaction. Just giving out options if 
incase you are blocked. 

 


was (Author: shivnarayan):
[~epaul]: If you can give us the steps to reproduce, would be easier for us 
work on the fix. 

> NULL values in timestamp column defaulted 
> ------------------------------------------
>
>                 Key: HUDI-1894
>                 URL: https://issues.apache.org/jira/browse/HUDI-1894
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: Spark Integration
>            Reporter: Eldhose Paul
>            Assignee: sivabalan narayanan
>            Priority: Major
>              Labels: sev:critical
>
> Reading timestamp column from hudi and underlying parquet files in spark 
> gives different results. 
> *hudi properties:*
> {code:java}
>  hdfs dfs -cat 
> /user/hive/warehouse/jira_expl.db/jiraissue_events/.hoodie/hoodie.properties
> #Properties saved on Tue May 11 17:17:22 EDT 2021
> #Tue May 11 17:17:22 EDT 2021
> hoodie.compaction.payload.class=org.apache.hudi.common.model.OverwriteWithLatestAvroPayload
> hoodie.table.name=jiraissue
> hoodie.archivelog.folder=archived
> hoodie.table.type=MERGE_ON_READ
> hoodie.table.version=1
> hoodie.timeline.layout.version=1
> {code}
>  
> *Reading directly from parquet using Spark:*
> {code:java}
> scala> val ji = 
> spark.read.format("parquet").load("/user/hive/warehouse/jira_expl.db/jiraissue_events/*.parquet")
> ji: org.apache.spark.sql.DataFrame = [_hoodie_commit_time: string, 
> _hoodie_commit_seqno: string ... 49 more fields]scala>  ji.filter($"id" === 
> 1237858).withColumn("inputfile", 
> input_file_name()).select($"_hoodie_commit_time", $"_hoodie_commit_seqno", 
> $"_hoodie_record_key", $"_hoodie_partition_path", 
> $"_hoodie_file_name",$"resolutiondate", $"archiveddate", 
> $"inputfile").show(false)
> +-------------------+----------------------+------------------+----------------------+-----------------------------------------------------------------------+--------------+------------+------------------------------------------------------------------------------------------------------------------------------------------------+
> |_hoodie_commit_time|_hoodie_commit_seqno  
> |_hoodie_record_key|_hoodie_partition_path|_hoodie_file_name                  
>                                     |resolutiondate|archiveddate|inputfile    
>                                                                               
>                                                      |
> +-------------------+----------------------+------------------+----------------------+-----------------------------------------------------------------------+--------------+------------+------------------------------------------------------------------------------------------------------------------------------------------------+
> |20210511171722     |20210511171722_7_13718|1237858.0         |               
>        
> |832cf07f-637b-4a4c-ab08-6929554f003a-0_7-98-5106_20210511171722.parquet|null 
>          |null        
> |hdfs://nameservice1/user/hive/warehouse/jira_expl.db/jiraissue_events/832cf07f-637b-4a4c-ab08-6929554f003a-0_7-98-5106_20210511171722.parquet
>    |
> |20210511171722     |20210511171722_7_13718|1237858.0         |               
>        
> |832cf07f-637b-4a4c-ab08-6929554f003a-0_7-98-5106_20210511171722.parquet|null 
>          |null        
> |hdfs://nameservice1/user/hive/warehouse/jira_expl.db/jiraissue_events/832cf07f-637b-4a4c-ab08-6929554f003a-0_8-1610-78711_20210511173615.parquet%7C
> +-------------------+----------------------+------------------+----------------------+-----------------------------------------------------------------------+--------------+------------+------------------------------------------------------------------------------------------------------------------------------------------------+
> {code}
> *Reading `hudi` using Spark:*
> {code:java}
> scala> val jih = 
> spark.read.format("org.apache.hudi").load("/user/hive/warehouse/jira_expl.db/jiraissue_events")
> jih: org.apache.spark.sql.DataFrame = [_hoodie_commit_time: string, 
> _hoodie_commit_seqno: string ... 49 more fields]scala> jih.filter($"id" === 
> 1237858).select($"_hoodie_commit_time", $"_hoodie_commit_seqno", 
> $"_hoodie_record_key", $"_hoodie_partition_path", 
> $"_hoodie_file_name",$"resolutiondate", $"archiveddate").show(false)
> +-------------------+----------------------+------------------+----------------------+-----------------------------------------------------------------------+-------------------+-------------------+
> |_hoodie_commit_time|_hoodie_commit_seqno  
> |_hoodie_record_key|_hoodie_partition_path|_hoodie_file_name                  
>                                     |resolutiondate     |archiveddate       |
> +-------------------+----------------------+------------------+----------------------+-----------------------------------------------------------------------+-------------------+-------------------+
> |20210511171722     |20210511171722_7_13718|1237858.0         |               
>        
> |832cf07f-637b-4a4c-ab08-6929554f003a-0_7-98-5106_20210511171722.parquet|2018-07-30
>  14:58:52|1969-12-31 19:00:00|
> +-------------------+----------------------+------------------+----------------------+-----------------------------------------------------------------------+-------------------+-------------------+
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to