juice411 opened a new issue, #11016:
URL: https://github.com/apache/hudi/issues/11016
Description:
I have created a Hudi table named ods_table_v1 using the following SQL
command:
sql
CREATE TABLE if not exists test_simulated_data.ods_table_v1(
id int,
count_field double,
write_time timestamp(0),
_part string,
proc_time timestamp(3),
WATERMARK FOR write_time AS write_time
)
PARTITIONED BY (_part)
WITH(
'connector'='hudi',
'path'='hdfs://masters/test_simulated_data/ods_table_v1',
'table.type'='MERGE_ON_READ',
'hoodie.datasource.write.recordkey.field'='id',
'hoodie.datasource.write.precombine.field'='write_time',
'compaction.async.enabled'='true',
'compaction.schedule.enabled'='true',
'compaction.trigger.strategy'='time_elapsed',
'compaction.delta_seconds'='600',
'compaction.delta_commits'='1',
'read.streaming.enabled'='true',
'read.streaming.skip_compaction'='true',
'read.start-commit'='earliest',
'changelog.enabled'='true',
'hive_sync.enable'='true',
'hive_sync.mode'='hms',
'hive_sync.metastore.uris'='thrift://h35:9083',
'hive_sync.db'='test_simulated_data',
'hive_sync.table'='hive_ods_table'
);
This table, ods_table_v1, has continuous data writes. However, after three
days of continuous writes, I noticed an issue with the data. When querying the
table for all data, I found that the earliest batch of written data is missing.
No matter what conditions I add, I cannot retrieve the earliest written data.
I am urgently seeking answers to understand the cause of this data loss. Has
anyone encountered a similar issue with Hudi tables? Is there a known issue or
configuration mistake that could have led to this? Any help or guidance would
be greatly appreciated.
Thank you in advance for your time and assistance.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]