[I] [SUPPORT] timestamp with logical type is timestamp-mills will cause data inconsistencies [hudi]

via GitHub Wed, 18 Oct 2023 21:05:31 -0700


KnightChess opened a new issue, #9884:
URL: https://github.com/apache/hudi/issues/9884

**Describe the problem you faced**

a table with col ts type is timestamp and it is a precombineKey

background：
flink streaming load and spark will sync to hive partitioned table every day.

question:
when use spark to query the table, the result show ts is `55758-12-02
03:30:01.0`, and if I use spark to query the table to sync other hive table,
the data update record will lose, the new data has been load into log file, but
the hive table only contain old value after sync. After compact, if I sync to
hive again, the result is correct.

analysis:
- commit instance, hoodie.properties all of them logical type are
`timestamp-mills`
- in spark code, when convert structType to avroType unable to distinguish
accuracy type, will use `timestamp-micros`

![image](https://github.com/apache/hudi/assets/20125927/9ed31b40-6eda-4c40-bfb8-e85cb3ba6da2)

- so, when use spark mergeingfileIterator, logfile use `timestamp-mills`,
base file use `timestamp-micros`

![image](https://github.com/apache/hudi/assets/20125927/6c62bc33-4654-406e-82ff-c28debefecd9)

![image](https://github.com/apache/hudi/assets/20125927/f2bfb45e-fc47-46e4-854f-d287d944e1ee)

so, if ts long value is `1697609536683`, base file will get
`1697609536683000`, log file is `1697609536683`.

the spark timestampType look like can not distinguish mills and micros, if
we direct conver structType to avroType, something data quality will happpend.

@YannByron @yihua @wzx140 @danny0405

**To Reproduce**

Steps to reproduce the behavior:

1.
2.
3.
4.

**Expected behavior**

A clear and concise description of what you expected to happen.

**Environment Description**

* Hudi version :

* Spark version :

* Hive version :

* Hadoop version :

* Storage (HDFS/S3/GCS..) :

* Running on Docker? (yes/no) :

**Additional context**

Add any other context about the problem here.

**Stacktrace**

```Add the stacktrace of the error.```

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [SUPPORT] timestamp with logical type is timestamp-mills will cause data inconsistencies [hudi]

Reply via email to