[
https://issues.apache.org/jira/browse/HUDI-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Istvan Darvas updated HUDI-4091:
--------------------------------
Description:
Hi Guys!
I am not able to use timestamp micro columns save with HUDI.
I would like to save it keeping microsec granularity, but it only keeps milisec.
I have set this:
--conf spark.sql.parquet.outputTimestampType=TIMESTAMP_MICROS \
and also this in the hoodie:
"hoodie.parquet.outputtimestamptype": "TIMESTAMP_MICROS",
but when I read it back (with pyspark, load api), it's only millisecond
precision and unfortunately, I need the microsec in some case, because with
this I run into a Schrödinger's cat situation
!https://a.slack-edge.com/production-standard-emoji-assets/13.0/google-medium/1f604.png!
So an entity has more than one states in the same time
!https://a.slack-edge.com/production-standard-emoji-assets/13.0/google-medium/1f604.png!Can
someone enlighten me what should I do?
Before save everything is fine! ("ts" column)
Darvi
SLACK Thread:
[https://apache-hudi.slack.com/archives/C4D716NPQ/p1652347742173779]
was:
Hi Guys!
I am not able to use timestamp micro columns save with HUDI.
I would like to save it keeping microsec granularity, but it only keeps milisec.
I have set this:
--conf spark.sql.parquet.outputTimestampType=TIMESTAMP_MICROS \
and also this in the hoodie:
"hoodie.parquet.outputtimestamptype": "TIMESTAMP_MICROS",
but when I read it back (with pyspark, load api), it's only millisecond
precision and unfortunately, I need the microsec in some case, because with
this I run into a Schrödinger's cat situation
!https://a.slack-edge.com/production-standard-emoji-assets/13.0/google-medium/1f604.png!
So an entity has more than one states in the same time
!https://a.slack-edge.com/production-standard-emoji-assets/13.0/google-medium/1f604.png!Can
someone enlighten me what should I do?
Before save everything is fine!
Darvi
SLACK Thread: https://apache-hudi.slack.com/archives/C4D716NPQ/p1652347742173779
> Timestamp micro handling
> ------------------------
>
> Key: HUDI-4091
> URL: https://issues.apache.org/jira/browse/HUDI-4091
> Project: Apache Hudi
> Issue Type: Bug
> Affects Versions: 0.10.1
> Environment: AWS EMR
> Reporter: Istvan Darvas
> Priority: Critical
> Attachments:
> b97b9e55-58a4-417b-b71c-f6b2d3860da0-0_0-26-1663_20220512111505310.parquet,
> before-save.png, example-code.txt
>
>
> Hi Guys!
>
> I am not able to use timestamp micro columns save with HUDI.
> I would like to save it keeping microsec granularity, but it only keeps
> milisec.
>
> I have set this:
> --conf spark.sql.parquet.outputTimestampType=TIMESTAMP_MICROS \
> and also this in the hoodie:
> "hoodie.parquet.outputtimestamptype": "TIMESTAMP_MICROS",
> but when I read it back (with pyspark, load api), it's only millisecond
> precision and unfortunately, I need the microsec in some case, because with
> this I run into a Schrödinger's cat situation
> !https://a.slack-edge.com/production-standard-emoji-assets/13.0/google-medium/1f604.png!
> So an entity has more than one states in the same time
> !https://a.slack-edge.com/production-standard-emoji-assets/13.0/google-medium/1f604.png!Can
> someone enlighten me what should I do?
>
> Before save everything is fine! ("ts" column)
> Darvi
> SLACK Thread:
> [https://apache-hudi.slack.com/archives/C4D716NPQ/p1652347742173779]
>
--
This message was sent by Atlassian Jira
(v8.20.7#820007)