juice411 opened a new issue, #11016:
URL: https://github.com/apache/hudi/issues/11016

   Description:
   I have created a Hudi table named ods_table_v1 using the following SQL 
command:
   
   sql
   CREATE TABLE if not exists test_simulated_data.ods_table_v1(
       id int,
       count_field double,
       write_time timestamp(0),
       _part string,
       proc_time timestamp(3),
       WATERMARK FOR write_time AS write_time
   )
   PARTITIONED BY (_part)
   WITH(
       'connector'='hudi',
       'path'='hdfs://masters/test_simulated_data/ods_table_v1',
       'table.type'='MERGE_ON_READ',
       'hoodie.datasource.write.recordkey.field'='id',
       'hoodie.datasource.write.precombine.field'='write_time',
       'compaction.async.enabled'='true',
       'compaction.schedule.enabled'='true',
       'compaction.trigger.strategy'='time_elapsed',
       'compaction.delta_seconds'='600',
       'compaction.delta_commits'='1',
       'read.streaming.enabled'='true',
       'read.streaming.skip_compaction'='true',
       'read.start-commit'='earliest',
       'changelog.enabled'='true',
       'hive_sync.enable'='true',
       'hive_sync.mode'='hms',
       'hive_sync.metastore.uris'='thrift://h35:9083',
       'hive_sync.db'='test_simulated_data',
       'hive_sync.table'='hive_ods_table'
   );
   This table, ods_table_v1, has continuous data writes. However, after three 
days of continuous writes, I noticed an issue with the data. When querying the 
table for all data, I found that the earliest batch of written data is missing. 
No matter what conditions I add, I cannot retrieve the earliest written data.
   
   I am urgently seeking answers to understand the cause of this data loss. Has 
anyone encountered a similar issue with Hudi tables? Is there a known issue or 
configuration mistake that could have led to this? Any help or guidance would 
be greatly appreciated.
   
   Thank you in advance for your time and assistance.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to