[
https://issues.apache.org/jira/browse/HUDI-4133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shuo Cheng resolved HUDI-4133.
------------------------------
> Spark query mor by snapshot query lost data
> ---------------------------------------------
>
> Key: HUDI-4133
> URL: https://issues.apache.org/jira/browse/HUDI-4133
> Project: Apache Hudi
> Issue Type: Bug
> Components: core, flink, spark-sql
> Reporter: loukey_j
> Assignee: loukey_j
> Priority: Major
> Labels: pull-request-available
>
> Suppose there are two no intersection batches of data written to a new hudi
> mor no partition table in turn by flink.
> Hooide timeline and log file as follows:
>
> hdfs dfs -ls hdfs://xxx/mor_test/.hoodie
> 0 2022-05-21 16:41 hdfs://xxx/mor_test/.hoodie/.aux
> 0 2022-05-21 16:41 hdfs://xxx/mor_test/.hoodie/.schema
> 0 2022-05-21 16:42 hdfs://xxx/mor_test/.hoodie/.temp
> 5291 2022-05-21 16:42
> hdfs://xxx/mor_test/.hoodie/20220521164201245.deltacommit
> 0 2022-05-21 16:42
> hdfs://xxx/mor_test/.hoodie/20220521164201245.deltacommit.inflight
> 0 2022-05-21 16:42
> hdfs://xxx/mor_test/.hoodie/20220521164201245.deltacommit.requested
> 5291 2022-05-21 16:42
> hdfs://xxx/mor_test/.hoodie/20220521164214473.deltacommit
> 0 2022-05-21 16:42
> hdfs://xxx/mor_test/.hoodie/20220521164214473.deltacommit.inflight
> 0 2022-05-21 16:42
> hdfs://xxx/mor_test/.hoodie/20220521164214473.deltacommit.requested
> 0 2022-05-21 16:41 hdfs://xxx/mor_test/.hoodie/archived
> 798 2022-05-21 16:41 hdfs://xxx/mor_test/.hoodie/hoodie.properties
> hdfs dfs -ls hdfs://xxx/mor_test/
> 13316 2022-05-21 16:42
> hdfs://xxx/mor_test/.00000000-1dd6-4395-9c90-53f8a6c6eed3_20220521164201245.log.1_0-2-0
> 28395 2022-05-21 16:42
> hdfs://xxx/mor_test/.00000000-1dd6-4395-9c90-53f8a6c6eed3_20220521164214473.log.1_0-2-0
> 0 2022-05-21 16:42 hdfs://xxx/mor_test/.hoodie
> 100 2022-05-21 16:42 hdfs://xxx/mor_test/.hoodie_partition_metadata
>
> Use spark snapshot query execute such sql 'select distinct
> _hoodie_commit_time from mor_test_rt'
> Expected results is 20220521164201245 and 20220521164214473, but actual
> results is 20220521164214473.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)