xzwDavid commented on issue #5765:
URL: https://github.com/apache/hudi/issues/5765#issuecomment-1732532455

   > Same issue, loading data from Spark-native parquet into MOR table. 5 
pieces (loads) completed succesfully, failed on 6-th. Data and structure is the 
same. Possibly some compaction/cleaning happens and invoke this problem?
   > 
   > incDF.filter("ins_flg='y'") .write .format("org.apache.hudi") 
.option(DataSourceWriteOptions.OPERATION_OPT_KEY, 
DataSourceWriteOptions.BULK_INSERT_OPERATION_OPT_VAL) 
.option(TABLE_TYPE_OPT_KEY, "MERGE_ON_READ") .option(RECORDKEY_FIELD_OPT_KEY, 
"rn") .option(PRECOMBINE_FIELD_OPT_KEY, "ss_date_time") 
.option(HIVE_SUPPORT_TIMESTAMP_TYPE.key, "true") .option("hoodie.index.type", 
"SIMPLE") .option(TABLE_NAME, "store_sales") .mode(SaveMode.Append) 
.save("/tmp/bench_hudi/store_sales")
   > 
   > UPD: yes, looks like compaction/cleaner works here (I have 2 writes per 
cycle and hoodie.cleaner.commits.retained = 10 by default). I added following 
options to disable cleaner in my test cycle, but error still appears on 10-th 
commit. "hoodie.keep.min.commits" -> "40", "hoodie.keep.max.commits" -> "50", 
"hoodie.cleaner.commits.retained" -> "30", "hoodie.clean.automatic" -> "false"
   
   Hi 
   I encounter the same issue. Do you fix this?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to