XXwhite opened a new issue, #6019:
URL: https://github.com/apache/hudi/issues/6019

   
我使用FlinkCDC读取MongoDB的数据写入到hudi中,但是我发现hudi中总数始终比mongo的总数少一部分(几条或者更多),并且很久之后也不会追上,貌似最后一批数据不会马上刷新到hudi中,这可能不是一个问题,但是我想验证数据一致性,请问我该怎么配置,这是我的建表语句,compaction部分的配置貌似没起作用。
   ```
   CREATE TABLE hudi_hesuan_box(
   _id STRING,
   gm STRING,
   kg TIMESTAMP(3),
   kr STRING,
   kj STRING,
   rs INT,
   zt INT,
   sg INT,
   v BIGINT,
   uid STRING,
   s INT,
   xm STRING,
   bg TIMESTAMP(3),
   bj STRING,
   br STRING,
   rlb ARRAY<ROW<_id STRING, md STRING, xm STRING, zj STRING, sj STRING, lx 
STRING, cj TIMESTAMP(3),s INT>>,
   PRIMARY KEY(_id) NOT ENFORCED 
   )with(
   'connector'='hudi',
   'path'= 'hdfs://cdh07:8020/cdc_test/hudi/hesuan_box',
   'hoodie.datasource.write.recordkey.field'= '_id',
   'hoodie.metadata.enable'='false',
   'write.precombine.field'= 'kg',
   'write.tasks'= '1',
   'write.rate.limit'= '2000',
   'table.type'= 'MERGE_ON_READ' ,
   'compaction.tasks'= '1',
   'compaction.async.enabled'= 'true',
   'compaction.trigger.strategy'= 'num_or_time',
   'compaction.delta_commits'= '1',
   'compaction.delta_seconds'= '120',
   'changelog.enabled'= 'true',
   'read.streaming.enabled'= 'true',
   'read.streaming.check-interval'= '3',
   'hive_sync.enable'= 'true',
   'hive_sync.mode'= 'hms',
   'hive_sync.metastore.uris'= 'thrift://cdh06:9083',
   'hive_sync.jdbc_url'= 'jdbc:hive2://cdh11:10000',
   'hive_sync.table'= 'hesuan_box',
   'hive_sync.db'= 'cdc_test',
   'hive_sync.username'= 'hdfs',
   'hive_sync.support_timestamp'= 'true'
   );
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to