Hi Team,

We are trying to run incremental updates to our MoR hudi table on S3 and it 
looks like inevitably after 20-30 commits table gets corrupted. We do initial 
data import and enable incremental upserts then we verify that tables are 
readable by running:
hive> select * from table_name _ro limit 1;

but after letting incremental upserts to run for several hours , the mentioned 
above select query starts throwing exceptions like:
Failed with exception java.io.IOException:java.lang.IllegalStateException: Hudi 
File Id (HoodieFileGroupId{partitionPath='983', 
fileId='8e9fde92-7515-4f89-a667-ce5c1087e60c-0'}) has more than 1 pending 
compactions.

Checking compactions mentioned in exception message via hudi-cli, do indeed 
verifies that fileid is present in both compactions. The upsert settings that 
we use are:
        hudiOptions = Map[String,String](
          HoodieWriteConfig.TABLE_NAME → inputTableName,
          "hoodie.consistency.check.enabled"->"true",
          "hoodie.compact.inline.max.delta.commits"->"30",
          "hoodie.compact.inline"->"true",
          "hoodie.clean.automatic"->"true",
          "hoodie.cleaner.commits.retained"->"1000",
          "hoodie.keep.min.commits"->"1001",
          "hoodie.keep.max.commits"->"1050",
          DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY -> "MERGE_ON_READ",
          DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY -> primaryKeys,
          DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY -> 
classOf[ComplexKeyGenerator].getName,
          DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY 
->"partition_val_str",
          DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY -> sortKeys,
          DataSourceWriteOptions.HIVE_SYNC_ENABLED_OPT_KEY → "true",
          DataSourceWriteOptions.HIVE_TABLE_OPT_KEY → inputTableName,
          DataSourceWriteOptions.HIVE_PARTITION_FIELDS_OPT_KEY → 
"partition_val_str",
          DataSourceWriteOptions.HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY → 
classOf[MultiPartKeysValueExtractor].getName,
          DataSourceWriteOptions.HIVE_URL_OPT_KEY 
->s"jdbc:hive2://$hiveServer2URI:10000"

        )

Any suggestions on what can cause or how to possibly debug this issue would 
help a lot.

Thank you,
Anton Zuyeu

Reply via email to