[GitHub] [hudi] Zouxxyy commented on issue #6931: SparkSQL create hudi DDL do not support hoodie.datasource.write.operation = 'insert'

GitBox Sun, 16 Oct 2022 18:38:59 -0700


Zouxxyy commented on issue #6931:
URL: https://github.com/apache/hudi/issues/6931#issuecomment-1280152704


   > @Zouxxyy I set `hoodie.datasource.write.operation='insert' ` and 
`hoodie.merge.allow.duplicate.on.inserts=true` properites in hudi table, when i 
use spark-sql insert into record to hudi table ,this can show duplicate 
record,but when i use flink-sql insert into record to hudi table ,only show no 
duplicate record.
   > 
   > spark-sql> `create table if not exists hudi.hudi_merge_test( uuid string 
,name string ,age int ,ts timestamp ,dt string ) using hudi tblproperties ( 
type = 'mor' ,primaryKey = 'uuid' ,hoodie.datasource.write.operation='insert' 
,hoodie.cleaner.fileversions.retained = '1' 
,hoodie.merge.allow.duplicate.on.inserts='true' ,hive_sync.skip_ro_suffix = 
'true' -- 去除ro后缀 ,write.parquet.max.file.size='120' --文件最大大小M 
,hoodie.datasource.write.hive_style_partitioning='true' 
,hoodie.archive.merge.enable='true' --自动小文件合并 
,hoodie.cleaner.commits.retained='1' --提交版本保留个数 ) partitioned by (dt) location 
'hdfs://namespace-HA-3/hudi/hudi_merge_test';`
   > 
   > flink-sql> `CREATE TABLE IF NOT EXISTS hudi_merge_test( uuid VARCHAR(20), 
name VARCHAR(10), age INT, ts TIMESTAMP(3), dt VARCHAR(20) ) PARTITIONED BY 
(dt) WITH ( 'connector' ='hudi', 'table.type' = 'MERGE_ON_READ', 
'write.operation'='insert', 'hoodie.datasource.write.recordkey.field' = 'uuid', 
'write.precombine.field' = 'ts', 'path' = 
'hdfs://namespace-HA-3/hudi/hudi_merge_test', 'write.tasks' = '4', 
'compaction.tasks' = '4', 'hoodie.archive.merge.enable'='true', --自动小文件合并 
'hoodie.cleaner.commits.retained'='1', --提交版本保留个数 
'hoodie.datasource.write.hive_style_partitioning'='true', --设置hive的分区格式 
'hoodie.embed.timeline.server'='false', 'hoodie.parquet.small.file.limit'='0', 
'hoodie.merge.allow.duplicate.on.inserts'='true', 'hive_sync.enable' = 'true', 
-- Required。开启 Hive 同步功能 'hive_sync.mode' = 'hms', -- Required。将 hive sync mode 
设置为 hms, 默认 jdbc 'hive_sync.metastore.uris' = 'thrift://dev2:9083', -- 
Required。m
 etastore 的端口 'hive_sync.jdbc_url' = 'jdbc:hive2://dev2:10000', 
'hive_sync.skip_ro_suffix' = 'true', -- 去除ro后缀 
'hive_sync.table'='hudi_compacte', -- required。hive 新建的表名 
'hive_sync.db'='hudi', -- required。hive 新建的数据库名 'hive_sync.username' = 'hive', 
'hive_sync.password' = '123456' )`
   
   Sorry, I don't know much about flink


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] Zouxxyy commented on issue #6931: SparkSQL create hudi DDL do not support hoodie.datasource.write.operation = 'insert'

Reply via email to