[jira] [Updated] (HUDI-4091) Timestamp micro handling

Istvan Darvas (Jira) Thu, 12 May 2022 08:15:07 -0700


     [ 
https://issues.apache.org/jira/browse/HUDI-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Istvan Darvas updated HUDI-4091:
--------------------------------
    Description: 
Hi Guys!
 
I am not able to use timestamp micro columns save with HUDI. 
I would like to save it keeping microsec granularity, but it only keeps milisec.
 
I have set this:
--conf spark.sql.parquet.outputTimestampType=TIMESTAMP_MICROS \
and also this in the hoodie:
"hoodie.parquet.outputtimestamptype": "TIMESTAMP_MICROS",
but when I read it back (with pyspark, load api), it's only millisecond 
precision and unfortunately, I need the microsec in some case, because with 
this I run into a Schrödinger's cat situation  
!https://a.slack-edge.com/production-standard-emoji-assets/13.0/google-medium/1f604.png!
So an entity has more than one states in the same time 
!https://a.slack-edge.com/production-standard-emoji-assets/13.0/google-medium/1f604.png!Can
 someone enlighten me what should I do?
 
Before save everything is fine! ("ts" column)

Darvi
SLACK Thread: 
[https://apache-hudi.slack.com/archives/C4D716NPQ/p1652347742173779]
 

  was:
Hi Guys!
 
I am not able to use timestamp micro columns save with HUDI. 
I would like to save it keeping microsec granularity, but it only keeps milisec.
 
I have set this:
--conf spark.sql.parquet.outputTimestampType=TIMESTAMP_MICROS \
and also this in the hoodie:
"hoodie.parquet.outputtimestamptype": "TIMESTAMP_MICROS",
but when I read it back (with pyspark, load api), it's only millisecond 
precision and unfortunately, I need the microsec in some case, because with 
this I run into a Schrödinger's cat situation 
!https://a.slack-edge.com/production-standard-emoji-assets/13.0/google-medium/1f604.png!
So an entity has more than one states in the same time 
!https://a.slack-edge.com/production-standard-emoji-assets/13.0/google-medium/1f604.png!Can
 someone enlighten me what should I do?
 
Before save everything is fine!

Darvi
SLACK Thread: https://apache-hudi.slack.com/archives/C4D716NPQ/p1652347742173779
 


> Timestamp micro handling
> ------------------------
>
>                 Key: HUDI-4091
>                 URL: https://issues.apache.org/jira/browse/HUDI-4091
>             Project: Apache Hudi
>          Issue Type: Bug
>    Affects Versions: 0.10.1
>         Environment: AWS EMR
>            Reporter: Istvan Darvas
>            Priority: Critical
>         Attachments: 
> b97b9e55-58a4-417b-b71c-f6b2d3860da0-0_0-26-1663_20220512111505310.parquet, 
> before-save.png, example-code.txt
>
>
> Hi Guys!
>  
> I am not able to use timestamp micro columns save with HUDI. 
> I would like to save it keeping microsec granularity, but it only keeps 
> milisec.
>  
> I have set this:
> --conf spark.sql.parquet.outputTimestampType=TIMESTAMP_MICROS \
> and also this in the hoodie:
> "hoodie.parquet.outputtimestamptype": "TIMESTAMP_MICROS",
> but when I read it back (with pyspark, load api), it's only millisecond 
> precision and unfortunately, I need the microsec in some case, because with 
> this I run into a Schrödinger's cat situation  
> !https://a.slack-edge.com/production-standard-emoji-assets/13.0/google-medium/1f604.png!
> So an entity has more than one states in the same time 
> !https://a.slack-edge.com/production-standard-emoji-assets/13.0/google-medium/1f604.png!Can
>  someone enlighten me what should I do?
>  
> Before save everything is fine! ("ts" column)
> Darvi
> SLACK Thread: 
> [https://apache-hudi.slack.com/archives/C4D716NPQ/p1652347742173779]
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Updated] (HUDI-4091) Timestamp micro handling

Reply via email to