[I] [SUPPORT] why DatasetBulkInsertCommitActionExecutor save twice？ [hudi]

via GitHub Mon, 04 Dec 2023 04:45:20 -0800


zyclove opened a new issue, #10237:
URL: https://github.com/apache/hudi/issues/10237


   
   A clear and concise description of the problem.
   
   hudi bulk save data twice.
   
![image](https://github.com/apache/hudi/assets/15028279/9008589c-036e-4e40-8091-c20f001c4968)
   
![image](https://github.com/apache/hudi/assets/15028279/0e964666-69b0-4a7e-931f-e6a976c31b0a)
   
   Steps to reproduce the behavior:
   
   1.table config
   CREATE  TABLE if NOT EXISTS bi_dw_real.smart_datapoint_report_rw_clear_rt(
     id STRING COMMENT 'id',
     uuid STRING COMMENT 'log uuid',
     data_id STRING COMMENT '',
     dev_id STRING COMMENT '',
     gw_id STRING COMMENT '',
     product_id STRING COMMENT '',
     uid STRING COMMENT '',
     dp_code STRING COMMENT '',
     dp_id STRING COMMENT '',
     dp_mode STRING COMMENT ',
     dp_name STRING COMMENT '',
     dp_time STRING COMMENT '',
     dp_type STRING COMMENT '',
     dp_value STRING COMMENT '',
     gmt_modified BIGINT COMMENT 'ct 时间',
     dt STRING COMMENT '时间分区字段'
   )
   using hudi 
   PARTITIONED BY (dt,dp_mode)
   COMMENT ''
   location '${bi_db_dir}/bi_ods_real/ods_smart_datapoint_report_rw_clear_rt'
   tblproperties (
     type = 'mor',
     primaryKey = 'id',
     preCombineField = 'gmt_modified',
     hoodie.combine.before.upsert='false',
     hoodie.metadata.record.index.enable='true',
     hoodie.datasource.write.operation='upsert',
     hoodie.metadata.enable='true',
     hoodie.datasource.write.hive_style_partitioning='true',
     hoodie.metadata.record.index.min.filegroup.count ='512',
     hoodie.index.type='RECORD_INDEX',
     hoodie.compact.inline='false',
     hoodie.common.spillable.diskmap.type='ROCKS_DB',
     hoodie.datasource.write.partitionpath.field='dt,dp_mode',
     
hoodie.compaction.payload.class='org.apache.hudi.common.model.PartialUpdateAvroPayload'
    )
   ;
   
   set 
hoodie.write.lock.zookeeper.lock_key=bi_ods_real.smart_datapoint_report_rw_clear_rt;
   set hoodie.storage.layout.type=DEFAULT;
   set hoodie.metadata.record.index.enable=true;
   set hoodie.metadata.enable=true;
   set hoodie.populate.meta.fields=false;
   set hoodie.parquet.compression.codec=snappy;
   set hoodie.memory.merge.max.size=2004857600000;
   set hoodie.write.buffer.limit.bytes=419430400;
   set hoodie.index.type=RECORD_INDEX;
   2.
   set hoodie.sql.insert.mode=non-strict;
   set hoodie.sql.bulk.insert.enable=true;
   3. spark-sql bulk insert 
   4. 
   insert into bi_dw_real.dwd_smart_datapoint_report_rw_clear_rt 
   
   
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version :0.14.0
   
   * Spark version :3.2.1
   
   * Hive version :3.1.3
   
   * Hadoop version :3.2.2
   
   * Storage (HDFS/S3/GCS..) :s3
   
   * Running on Docker? (yes/no) :no
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [SUPPORT] why DatasetBulkInsertCommitActionExecutor save twice？ [hudi]

Reply via email to