jjtjiang commented on issue #7057:
URL: https://github.com/apache/hudi/issues/7057#issuecomment-1846470553
@ad1happy2go
i also face this problem .
version : hudi 0.12.3
how to reproduce the issue: just use the insert overwirte sql when
insert a big table .
here is my case:
row: 1148000000 (if the rows is smaller .eg 1000000 ,there will can't
reproduce this issue)
ddl :` create table temp_db.ods_cis_corp_history_profile_hudi_t1_20231208(
`_hoodie_is_deleted` BOOLEAN,
`t_pre_combine_field` long,
order_type int ,
order_no int ,
profile_no int ,
profile_type string ,
profile_cat string ,
u_version string ,
order_line_no int ,
profile_c string ,
profile_i int ,
profile_f decimal(20,8) ,
profile_d timestamp ,
active string ,
entry_datetime timestamp ,
entry_id int ,
h_version int )
USING hudi
TBLPROPERTIES (
'hoodie.write.concurrency.mode'='optimistic_concurrency_control' ,
'hoodie.cleaner.policy.failed.writes'='LAZY',
'hoodie.write.lock.provider'='org.apache.hudi.client.transaction.lock.FileSystemBasedLockProvider',
'hoodie.write.lock.filesystem.expire'= 5,
'primaryKey' = 'order_no,profile_type,profile_no,order_type,profile_cat',
'type' = 'cow',
'preCombineField' = 't_pre_combine_field')
CLUSTERED BY (
order_no,profile_type,profile_no,order_type,profile_cat)
INTO 2 BUCKETS;`
sql: `
insert
overwrite table temp_db.ods_cis_corp_history_profile_hudi_t1_20231208
select
false ,
1,
order_type ,
order_no ,
profile_no ,
profile_type ,
profile_cat ,
u_version ,
order_line_no ,
profile_c ,
profile_i ,
profile_f ,
profile_d ,
active ,
entry_datetime ,
entry_id ,
h_version
from
temp_db.ods_cis_dbo_history_profile_tmp ; `
insert
overwrite table temp_db.ods_cis_corp_history_profile_hudi_t1_20231208
select
false ,
1,
order_type ,
order_no ,
profile_no ,
profile_type ,
profile_cat ,
u_version ,
order_line_no ,
profile_c ,
profile_i ,
profile_f ,
profile_d ,
active ,
entry_datetime ,
entry_id ,
h_version
from
temp_db.ods_cis_dbo_history_profile_tmp ;
./hoodie dir file list:
.hoodie/.aux
.hoodie/.heartbeat
.hoodie/.schema
.hoodie/.temp
.hoodie/20231207055239027.replacecommit
.hoodie/20231207055239027.replacecommit.inflight
.hoodie/20231207055239027.replacecommit.requested
.hoodie/20231207084620796.replacecommit
.hoodie/20231207084620796.replacecommit.inflight
.hoodie/20231207084620796.replacecommit.requested
.hoodie/20231207100918624.rollback
.hoodie/20231207100918624.rollback.inflight
.hoodie/20231207100918624.rollback.requested
.hoodie/20231207100923823.rollback
.hoodie/20231207100923823.rollback.inflight
.hoodie/20231207100923823.rollback.requested
.hoodie/20231207102003686.replacecommit
.hoodie/20231207102003686.replacecommit.inflight
.hoodie/20231207102003686.replacecommit.requested
.hoodie/archived
.hoodie/hoodie.properties
.hoodie/metadata
we cant see there is no file 20231207071610343.replacecommit.requested .
but the program needs to find this file. so it failed .this make me wonder.
hoodie.properties:
hoodie.table.precombine.field=t_pre_combine_field
hoodie.datasource.write.drop.partition.columns=false
hoodie.table.type=COPY_ON_WRITE
hoodie.archivelog.folder=archived
hoodie.timeline.layout.version=1
hoodie.table.version=5
hoodie.table.metadata.partitions=files
hoodie.table.recordkey.fields=order_no,profile_type,profile_no,order_type,profile_cat
hoodie.database.name=temp_db
hoodie.datasource.write.partitionpath.urlencode=false
hoodie.table.keygenerator.class=org.apache.hudi.keygen.NonpartitionedKeyGenerator
hoodie.table.name=ods_cis_corp_history_profile_hudi_t1_20231207
hoodie.datasource.write.hive_style_partitioning=true
hoodie.table.checksum=2702244832
hoodie.table.create.schema={"type"\:"record","name"\:"ods_cis_corp_history_profile_hudi_t1_20231207_record","namespace"\:"hoodie.ods_cis_corp_history_profile_hudi_t1_20231207","fields"\:[{"name"\:"_hoodie_commit_time","type"\:["string","null"]},{"name"\:"_hoodie_commit_seqno","type"\:["string","null"]},{"name"\:"_hoodie_record_key","type"\:["string","null"]},{"name"\:"_hoodie_partition_path","type"\:["string","null"]},{"name"\:"_hoodie_file_name","type"\:["string","null"]},{"name"\:"_hoodie_is_deleted","type"\:["boolean","null"]},{"name"\:"t_pre_combine_field","type"\:["long","null"]},{"name"\:"order_type","type"\:["int","null"]},{"name"\:"order_no","type"\:["int","null"]},{"name"\:"profile_no","type"\:["int","null"]},{"name"\:"profile_type","type"\:["string","null"]},{"name"\:"profile_cat","type"\:["string","null"]},{"name"\:"u_version","type"\:["string","null"]},{"name"\:"order_line_no","type"\:["int","null"]},{"name"\:"profile_c","type"\:["string","null"]},{"name"\:"profile_i",
"type"\:["int","null"]},{"name"\:"profile_f","type"\:[{"type"\:"fixed","name"\:"fixed","namespace"\:"hoodie.ods_cis_corp_history_profile_hudi_t1_20231207.ods_cis_corp_history_profile_hudi_t1_20231207_record.profile_f","size"\:9,"logicalType"\:"decimal","precision"\:20,"scale"\:8},"null"]},{"name"\:"profile_d","type"\:[{"type"\:"long","logicalType"\:"timestamp-micros"},"null"]},{"name"\:"active","type"\:["string","null"]},{"name"\:"entry_datetime","type"\:[{"type"\:"long","logicalType"\:"timestamp-micros"},"null"]},{"name"\:"entry_id","type"\:["int","null"]},{"name"\:"h_version","type"\:["int","null"]}]}
more logs
[hudi.log](https://github.com/apache/hudi/files/13608380/hudi.log)
see the attachment
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]