Hi Team, We are trying to run incremental updates to our MoR hudi table on S3 and it looks like inevitably after 20-30 commits table gets corrupted. We do initial data import and enable incremental upserts then we verify that tables are readable by running: hive> select * from table_name _ro limit 1;
but after letting incremental upserts to run for several hours , the mentioned above select query starts throwing exceptions like: Failed with exception java.io.IOException:java.lang.IllegalStateException: Hudi File Id (HoodieFileGroupId{partitionPath='983', fileId='8e9fde92-7515-4f89-a667-ce5c1087e60c-0'}) has more than 1 pending compactions. Checking compactions mentioned in exception message via hudi-cli, do indeed verifies that fileid is present in both compactions. The upsert settings that we use are: hudiOptions = Map[String,String]( HoodieWriteConfig.TABLE_NAME → inputTableName, "hoodie.consistency.check.enabled"->"true", "hoodie.compact.inline.max.delta.commits"->"30", "hoodie.compact.inline"->"true", "hoodie.clean.automatic"->"true", "hoodie.cleaner.commits.retained"->"1000", "hoodie.keep.min.commits"->"1001", "hoodie.keep.max.commits"->"1050", DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY -> "MERGE_ON_READ", DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY -> primaryKeys, DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY -> classOf[ComplexKeyGenerator].getName, DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY ->"partition_val_str", DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY -> sortKeys, DataSourceWriteOptions.HIVE_SYNC_ENABLED_OPT_KEY → "true", DataSourceWriteOptions.HIVE_TABLE_OPT_KEY → inputTableName, DataSourceWriteOptions.HIVE_PARTITION_FIELDS_OPT_KEY → "partition_val_str", DataSourceWriteOptions.HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY → classOf[MultiPartKeysValueExtractor].getName, DataSourceWriteOptions.HIVE_URL_OPT_KEY ->s"jdbc:hive2://$hiveServer2URI:10000" ) Any suggestions on what can cause or how to possibly debug this issue would help a lot. Thank you, Anton Zuyeu