If you are running inline compaction it should not cause two pending compactions on the same file group. Along with above details, can you please open a [SUPPORT] git issue with full stack trace and also a `ls` of you .hoodie folder if possible?
Thanks, Sudha On Thu, Jun 18, 2020 at 9:57 PM Zuyeu, Anton <zuyan...@amazon.com.invalid> wrote: > Hi Team, > > We are trying to run incremental updates to our MoR hudi table on S3 and > it looks like inevitably after 20-30 commits table gets corrupted. We do > initial data import and enable incremental upserts then we verify that > tables are readable by running: > hive> select * from table_name _ro limit 1; > > but after letting incremental upserts to run for several hours , the > mentioned above select query starts throwing exceptions like: > Failed with exception java.io.IOException:java.lang.IllegalStateException: > Hudi File Id (HoodieFileGroupId{partitionPath='983', > fileId='8e9fde92-7515-4f89-a667-ce5c1087e60c-0'}) has more than 1 pending > compactions. > > Checking compactions mentioned in exception message via hudi-cli, do > indeed verifies that fileid is present in both compactions. The upsert > settings that we use are: > hudiOptions = Map[String,String]( > HoodieWriteConfig.TABLE_NAME → inputTableName, > "hoodie.consistency.check.enabled"->"true", > "hoodie.compact.inline.max.delta.commits"->"30", > "hoodie.compact.inline"->"true", > "hoodie.clean.automatic"->"true", > "hoodie.cleaner.commits.retained"->"1000", > "hoodie.keep.min.commits"->"1001", > "hoodie.keep.max.commits"->"1050", > DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY -> "MERGE_ON_READ", > DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY -> primaryKeys, > DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY -> > classOf[ComplexKeyGenerator].getName, > DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY > ->"partition_val_str", > DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY -> sortKeys, > DataSourceWriteOptions.HIVE_SYNC_ENABLED_OPT_KEY → "true", > DataSourceWriteOptions.HIVE_TABLE_OPT_KEY → inputTableName, > DataSourceWriteOptions.HIVE_PARTITION_FIELDS_OPT_KEY → > "partition_val_str", > DataSourceWriteOptions.HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY → > classOf[MultiPartKeysValueExtractor].getName, > DataSourceWriteOptions.HIVE_URL_OPT_KEY > ->s"jdbc:hive2://$hiveServer2URI:10000" > > ) > > Any suggestions on what can cause or how to possibly debug this issue > would help a lot. > > Thank you, > Anton Zuyeu >